(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT.COOPERATION TREATY (PCT) 




mill 



(19) World Intellectual Property Organization 
Internationa] Bureau 

(43) International Publication Date 00) International Publication Number 

17 January 2002 (17.01.2002) PCT WO 02/04493 A2 



(51) International Patent Classification 7 : C07K 14/155 

(21) International Application Number: PO7US0 1/2 1241 

(22) . International Filing Date: 5 July 2001 (05.07.2001) 

(25) Filing Language: English 

(26) Publication Language: English 

(30) Priority Data: 

09/610,313 5 July 2000 (05.07.2000) US 

(71) Applicants (for all designated States except US): CHI- 
RON CORPORATION [US/US]; 4560 Horton Street, 
Emeryville, CA 94608 (US). UNIVERSITY OF STEL- 
LENBOSCH [ZA/ZA]; P.O. Box 19063, 7505 Tygerbcrg 
(ZA). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): ZUR MEGEDE, 
Jan PEAJS]; c/o Chiron Corporation, 4560 Horton Street 
- R440, Emeryville, CA 94608 (US). BARNETT, Susan, 
W. [US/US]; c/o Chiron Corporation, 4560 Horton Street 
. R440, Emeryville, CA 94608 (US). ENGELBRECHT, 
Susan [ZA/ZA]; c/o University of Stellenbosch, P.O. 
Box 19063, 7505 Tygerberg (ZA). VAN RENSBURG, 
Estrelita, Janse [ZA/ZA]; c/o University of Stellenbosch, 
P.O. Box 19063, 7505 Tygerberg (ZA). 



(74) Agents: DOLLARD, Anne, S. et al.; Chiron Corporation, 
Intellectual Property - R440, P.O. Box 8097, Emeryville, 
CA 94662-8097 (US). 

(81) Designated States (national): AE t AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, Fl, GB, GD, GE, GH, 
GM, HR, HI J, ID, TL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, IT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, 
SL, TJ, TM, TR, IT, TZ UA, UG, US, UZ, VN, YU, ZA, 
ZW. 



(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, T.S, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF, 
CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 



Published: 

— without international search report and to be republished 
. upon receipt of that report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 
o\ 

: : ■ : 

® (54) Title: POLYNUCLEOTTDES ENCODING ANTIGENIC HTV TYPE C POLYPEPTIDES, POLYPEPTIDES AND USES 
THEREOF 

^ (57) Abstract: The present invention relates to polynucleotides encoding immunogenic HIV type C polypeptides. Uses of the 
£^ polynucleotides in applications including DNA immunization, generation of packaging cell lines, and production of HIV Type C 
^ proteins are also described. 



WO 02/04493 



PCT/USO 1/2 1241 



POLYNUCLEOTIDES ENCODING ANTIGENIC HIV TYPE C POLYPEPTIDES, 
POLYPEPTIDES AND USES THEREOF 

Technical Field 

5 Polynucleotides encoding antigenic Type C HTV polypeptides (e,g, Gag, pol, vif, vpr, 

tat, rev, vpu, env, and nef) are described, as are uses of these polynucleotides and polypeptide 
products in immunogenic compositions. Also described are polynucleotide sequences from 
South African variants of HIV Type C - 

1 0 Background of the Invention 

Acquired immune deficiency syndrome (AIDS) is recognized as one of the greatest 
health threats facing modern medicine. There is, as yet, no cure for this disease. 
In 1983-1984, three groups independently identified the suspected etiological agent of AIDS. 
See, e.g., Barre-Sinoussi et al. (1983) Science 220:868-871; Montagnier et al, in Human 

15 T-Cell Leukemia Viruses (Gallo, Essex & Gross, eds., 1984); Vihner et al. (1984) The 
Lancet 1:753; Popovic et al (1984) Science 224:497-500; Levy et al. (1984) Science 
225:840-842. These isolates were variously called lymphadenopathy : associated virus (LAV), 
human T-cell lymphotropic virus type in (HTLV-HI), or AIDS-associated retrovirus (ARV). 
All of these isolates are strains of the same virus, and were later collectively named Human 

20 Immunodeficiency^Virus (HTV). With the isolation of a related AIDS-causing virus, the 

strains originally called HIV are now termed HTV-1 and the related virus is called HIV-2 See, 
e.g., Guyader et al. (1987) Nature 326:662-669; Brun-Vezinet et al. (1986) Science 
233:343-346; Clavel et al. (1986) Nature 324:691-695. 

A great deal of information has been gathered about the HIV virus, however, to date 

25 an effective vaccine has not been identified. Several targets for vaccine development have 

been examined including the env and Gag gene products encoded by HIV. Gag gene products 
include, but are not limited to, Gag-polymerase and Gag-protease. Env gene products 
include, but are not limited to, monomeric gpl20 polypeptides, oligomeric gpl40 
polypeptides and gp 1 60 polypeptides. 

30 Haas, et al., {Current Biology 6(3):315-324, 1996) suggested that selective codon 

usage by HIV- 1 appeared to account for a substantial fraction of the inefficiency of viral 
protein synthesis. Andre, et al., {1 Virol 72(2): 1497-1 503, 1998) described an increased ' 
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immune response elicited by DNA vaccination employing a synthetic gpl20 sequence with 
modified codon usage. Schneider, et al., (J Virol 71(7):4892-4903, 1997) discuss 
inactivation of inhibitory (or instability) elements (INS) located within the coding sequences 
of the Gag and Gag-protease coding sequences. 
5 The Gag proteins of HIV- 1 are necessary for the assembly of virus-like particles. 

HTV-1 Gag proteins are involved in many stages of the life cycle of the virus including, 
assembly, virion maturation after particle release, and early post-entry steps in virus 
replication. The roles of HTV-1 Gag- proteins are numerous and complex (Freed, E.O., 
Virology 251:1-15, 1998). 

10 Wolf, et al, (PCT International Application, WO 96/30523, published 3 October 

1996; European Patent Application, Publication No. 0 449 116 Al, published 2 October 
1991) have described the use of altered pr55 Gag of HIV-l to act as a non-infectious 
retro viral-like particulate carrier, in particular, for the presentation of immunologically 
important epitopes. Wang, et al., {Virology 200:524-534, 1994) describe a system to study 

15 assembly of HIV Gag-p-galactosidase fusion proteins into virions. They describe the 

construction of sequences encoding HTV Gag-P-galactosidase fusion proteins, the expression 
of such sequences in the presence of HTV Gag proteins, and assembly of these proteins into 
virus particles. 

Shiver, et al., (PCT International Application, WO 98/34640, published 13 August 
20 1998) described altering HIV-l (CAM1) Gag coding sequences to produce synthetic DNA 
molecules encoding HIV Gag and modifications of HIV Gag. The codons of the synthetic 
molecules were codons preferred by a projected host cell. 

Recently, use of HTV Env polypeptides in immunogenic compositions has been 
described, (see, U.S. Patent No. 5,846,546 to Hurwitz et al., issued December 8, 1998, 
25 describing immunogenic compositions comprising a mixture of at least four different 
recombinant virus that each express a different HTV env variant; and U.S. Patent No. 
5,840,313 to Vahlne et al., issued November 24, 1998, describing peptides which correspond 
to epitopes of the HTV-1 gpl20 protein). In addition, U.S. Patent No. 5,876,731 to Sia et al, 
issued March 2, 1999 describes candidate vaccines against HTV comprising an amino acid 
30 sequence of a T-cell epitope of Gag linked directly to an amino acid sequence of a B-cell 
epitope of the V3 loop protein of an HTV-1 isolate containing the sequence GPGR. There 
remains a need for antigenic HTV polypeptides, particularly Type C isolates. 

2 
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Summary of the Invention 

Described herein are novel Type C HIV sequences, for example, 8_5_TV1_C.ZA, 
8_2JTV1_C.ZA and 12-5_1 JTV2_C.ZA, polypeptides encoded by these novel sequences, 
5 and synthetic expression cassettes generated from these and other Type C HTV sequences. 

In certain embodiments, the present invention relates synthetic expression cassettes 
encoding HIV Type C polypeptides, including Env, Gag, Pol, Prot, Vpr, Vpu, Vif, Nef, Tat, 
Rev and/or fragments thereof. In addition, the present invention also relates to improved 
expression of HIV Type C polypeptides and production of virus-like particles. Synthetic 

10 expression cassettes encoding the HTV polypeptides (e.g., Gag-, pol-, protease (prot)-, reverse 
transcriptase, integrase, RNAseH, Tat, Rev, Nef, Vpr, Vpu, Vif and/or Env- containing 
polypeptides) are described, as are uses of the expression cassettes. 

Thus, one aspect of the present invention relates to expression cassettes and 
polynucleotides contained therein. The expression cassettes typically include an HTV- 

1 5 polypeptide encoding sequence inserted into an expression vector backbone. In one 

embodiment, an expression cassette comprises a polynucleotide sequence encoding one or 
more PoZ-containing polypeptides, wherein the polynucleotide sequence comprises a 
sequence having at least about 85%, preferably about 90%, more preferably about 95%, and 
more preferably about 98% sequence (and any integers between these values) identity to the 

20 sequences taught in the present specification. The polynucleotide sequences encoding Pol- 
containing polypeptides include, but are not limited to, those shown in SEQ ID NO:30, SEQ 
ID NO:31; SEQ ID NO:32; SEQ ID NO:62; SEQ ID NO:103; SEQ ID NO:58; SEQ ID 
NO:60; SEQ ID NO:64; SEQ ID NO:66; SEQ ID NO:68; SEQ ID NO:70; SEQ ID NO:76; 
and SEQIDNO:78. 

25 The polynucleotides encoding the HTV polypeptides of the present invention may also 

include sequences encoding additional polypeptides. Such additional polynucleotides 
encoding polypeptides may include, for example, coding sequences for other viral proteins 
(e.g.. hepatitis B or C or other HTV proteins, such as, polynucleotide sequences encoding an 
HTV Gag polypeptide, polynucleotide sequences encoding an HIV Env polypeptide and/or 

30 polynucleotides encoding one or more of vif, vpr, tat, rev, vpu and nef); cytokines or other 
transgenes. In one embodiment, the sequence encoding the HIV Pol polypeptide(s) can be 
modified by deletions of coding regions corresponding to reverse transcriptase and integrase. 
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Such deletions in the polymerase polypeptide can also be made such that the polynucleotide 
sequence preserves T-helper cell and CTL epitopes. Other antigens of interest may be 
inserted into the polymerase as well. 

In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HIV Gag-containing polypeptide, wherein the 
polynucleotide sequence encoding the Gag polypeptide comprises a sequence having at least 
about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98% sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding Gag-containing polypeptides include, but are not limited 
to, the following polynucleotides: nucleotides 844-903 of Figure 1 (a Gag major homology 
region) (SEQ ID NO:l); nucleotides 841-900 of Figure 2 (a Gag major homology region) 
(SEQ ID NO:2); Figure 24 (SEQ ID NO:53, a Gag major homology region); the sequence 
presented as Figure 1 (SEQ ID NO:3); the sequence presented as Figure 22 (SEQ ID NO:51); 
the sequence presented as Figure 70 (SEQ ID NO:99); and the sequence presented as Figure 2 
(SEQ ID NO:4). As noted above, the polynucleotides encoding the Gag-containing 
polypeptides of the present invention may also include sequences encoding additional 
polypeptides. 

In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HDV £wv-containing polypeptide, wherein the 
polynucleotide sequence encoding the Env polypeptide comprises a sequence having at least 
about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98% sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding Ifov-containing polypeptides include, but are not limited 
to, the following polynucleotides: nucleotides 1213-1353 of Figure 3 (SEQ ID NO:5) 
(encoding an Env common region); the sequence presented as Figure 17 (SE(J ID NO:46) 
(encoding a 97 nucleotide long Env common region); SEQ ID NO:47 (encoding a 144 
nucleotide long Env common region); nucleotides 82-1512 of Figure 3 (SEQ ID NO:6) 
(encoding a gpl20 polypeptide); nucleotides 82-2025 of Figure 3 (SEQ ID NO:7) (encoding a 
gpl40 polypeptide); nucleotides 82-2547 of Figure 3 (SEQ ID NO:8) (encoding a gpl60 
polypeptide); SEQ ID NO:49 (encoding a gpl60 polypeptide); nucleotides 1-2547 of Figure 3 
(SEQ ID NO:9) (encoding a gpl60 polypeptide with signal sequence); nucleotides 1513-2547 
of Figure 3 (SEQ ID NO:10) (encoding a gp41 polypeptide); nucleotides 1210-1353 of 
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Figure 4 (SEQ ID NO: 1 1) (encoding an Env common region); nucleotides 734509 of Figure 
.4 (SEQ ID NO:12) (encoding a gpl20 polypeptide); nucleotides 73-2022 of Figure 4 (SEQ 
ID NO: 13) (encoding a gpl40 polypeptide); nucleotides 73-2565 of Figure 4 (SEQ ID 
NO:14) (encoding a gpl60 polypeptide); nucleotides 1-2565 of Figure 4 (SEQ ID NO:15) 
5 (encoding a gpl 60 polypeptide with signal sequence); the sequence presented as Figure 20 
(SEQ ID NO:49) (encoding a gp!60 polypeptide); the sequence presented as Figure 68 (SEQ 
ID NO:97) (encoding a gp 1 60 polypeptide); nucleotides 1 5 1 0-2565 of Figure 4 (SEQ ID 
NO:16) (encoding a gp41 .polypeptide); nucleotides 7 to 1464 of Figure 90 (SEQ ED NO:l 19) 
(encoding a gpl20 polypeptide with modified wild type signal sequence); nucleotides 7 to 

10 1977 of Figure 91 (SEQ ID NO:120) (encoding a gpl40 polypeptide including signal 
sequence modified from wild-type 8_2_TV1__C.ZA (e.g., "modified wild type leader 
sequence")); nucleotides 7 to 1977 of Figure 92 (SEQ ID NO:121) (encoding a gpl40 
polypeptide with modified wild type 8J2TV1_C.ZA signal sequence); nucleotides 7 to 2388 
of Figure 93 (SEQ ID NO:122) (encoding a gpl60 polypeptide with modified wild type 

15 signal sequence); nucleotides 7 to 2520 of Figure 94 (SEQ ID NO:123) (encoding a gpl60 

polypeptide with modified wild type 8_2JTV1_C.ZA signal sequence); nucleotides 7 to 2520 
of Figure 95 (SEQ ID NO:124) (encoding a gpl60 polypeptide with modified wild type 
8_2_TV1_C.ZA signal sequence); nucleotides 13 to 2604 of Figure 96 (SEQ ID NO: 125) 
(encoding a gpl60 polypeptide with TPA1 signal sequence); nucleotides 7 to 2607 of Figure 

20 97 (SEQ ID NO:126) (encoding a gpl60 polypeptide with modified wild type 

8J2_TV1_C.ZA signal sequence); nucleotides 1 to 2049 of Figure 100 (SEQ ID NO:131) 
(encoding a gpl40 polypeptide with TPAl signal sequence); nucleotides 7 to 1607 of Figure 
98 (SEQ ID NO:126) (encoding a gpl60 polypeptide with wild type 8__2_TV1_C.ZA signal 
sequence); nucleotides 7 to 2064 of SEQ ID NO:132 (encoding a gpl40 polypeptide with 

25 modified wild-type 8J2JTV1J3.ZA leader sequence); and nucleotides 7 to 2064 of SEQ ID 
NO: 1 33 (encoding a gpl40 polypeptide with wild-type 8_2JTV1_C.ZA leader sequence). 

In certain embodiments, the Env-encoding sequences will contain further 
modifications, for instance mutation of the cleavage site to prevent the cleavage of a gpl40 
polypeptide into a gpl20 polypeptide and a gp41 polypeptide (SEQ ED NO: 121 and SEQ ID 

30 NO:124) or deletion of variable regions VI and/or V2 (SEQ ID NO:119; SEQ ID NO:120; 
SEQ ID NO:121; SEQ ID NO:122; SEQ ID NO:123; and SEQ ID NO:124). 



5 



WO 02/04493 . PCT/US01/21241 

In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HIV Ne/-containing polypeptide, wherein the 
polynucleotide sequence encoding the Nef polypeptide comprises a sequence having at least 
about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
5 98% sequence identity to the sequences taught in the present specification. The 

polynucleotide sequences encoding JVef-containing polypeptides include, but are not limited 
to, the following polynucleotides: the sequence presented in Figure 26 (SEQ ID NO:55); the 
sequence presented in Figure 72 (SEQ ID NO:101); the sequence presented in Figure 28 
(SEQ ED NO:57); the sequence presented in Figure 67 (SEQ ID NO:96); the sequence 
10 presented in Figure 103 (SEQ ID NO:134); and the sequence presented in Figure 104 (SEQ 
IDNO:135). 

In another embodiment, an expression cassette comprises a polynucleotide sequence . 
encoding a polypeptide including an HIV ifev-containing polypeptide, wherein the 
polynucleotide sequence encoding the Rev polypeptide comprises a sequence having at least 

1 5 about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98% sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding i?ev-containing polypeptides include, but are not limited 
to, the following polynucleotides: the sequence presented in Figure 43 (SEQ ID NO:72); the 
sequence presented in Figure 76 (SEQ ID NO: 1 05); the sequence presented in Figure 45 

20 (SEQ ID NO:74); the sequence presented in Figure 78 (SEQ ID NO: 1 07); and the sequence 
presented in Figure 62 (SEQ ID NO:91). 

In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HIV r<atf-containing polypeptide, wherein the 
polynucleotide sequence encoding the Tat polypeptide comprises a sequence having at least 

25 about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98% sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding rar-containing polypeptides include, but are not limited 
to, the following polynucleotides: the sequence presented in Figure 51 (SEQ ID NO: 80); the 
sequence presented in Figure 80 (SEQ ID NO:109); the sequence presented in Figure 52 

30 (SEQ ID NO:81); the sequence presented in Figure 54 (SEQ ID NO:83); and the sequence 
presented in Figure 82 (SEQ ID NO: 1 1 1). 
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In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HIV ^containing polypeptide, wherein the 
polynucleotide sequence encoding the Fz/polypeptide comprises a sequence having at least 
about 85%, preferably about 90%, more preferably about 95%, and most preferably about 

5 98% sequence identity to the sequences taught in the present specification. The 

polynucleotide sequences encoding Fzf-containing polypeptides include, but are not limited 
to, the following polynucleotides: the sequence presented in Figure 56 (SEQ ID NO: 85); and 
the sequence presented in Figure 84 (SEQ ID NO : 1 1 3) . 

In another embodiment, an expression cassette comprises a polynucleotide sequence 

1 0 encoding a polypeptide including an HIV ^-containing polypeptide, wherein the 

polynucleotide sequence encoding the Vpr polypeptide comprises a sequence having at least 
about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98%o sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding ^r-containing polypeptides include, but are not limited 

15 to, the following polynucleotides: the sequence presented in Figure 58 (SEQ ID NO:87); and 
the sequence presented in Figure 86 (SEQ ID NO:l 15). 

In another embodiment, an expression cassette comprises a polynucleotide sequence 
encoding a polypeptide including an HIV ^pw-containing polypeptide, wherein the 
polynucleotide sequence encoding the Vpu polypeptide comprises a sequence having at least 

20 about 85%, preferably about 90%, more preferably about 95%, and most preferably about 
98% sequence identity to the sequences taught in the present specification. The 
polynucleotide sequences encoding J^w-containing polypeptides include, but are not limited 
to, the following polynucleotides: the sequence presented in Figure 60 (SEQ ID NO:89); and 
the sequence presented in Figure 88 (SEQ ID NO: 117). 

25 Further embodiments of the present invention include purified polynucleotides of any 

of the sequences described herein. Exemplary polynucleotide sequences encoding Gag- 
containing polypeptides include, but are not limited to, the following polynucleotides: 
nucleotides 844-903 of Figure 1 (SEQ ID NO:l) (a Gag major homology region); nucleotides 
841-900 of Figure 2 (SEQ ID NO:2) (a Gag major homology region); the sequence presented 

30 as Figure 1 (SEQ ID NO:3); the sequence presented as Figure 2 (SEQ ID NO:4); the 

sequence presented as Figure 22 (SEQ ID NO:51); the sequence presented as Figure 70 (SEQ 
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ID NO:99); and the sequence presented as Figure 24 (SEQ ID NO:53) (a Gag major 
homology region). 

Exemplary polynucleotide sequences encoding i?7iv^ontaining polypeptides include, 
but are not limited to, the following polynucleotides: nucleotides 1213-1353 of Figure 3 
(SEQ ID NO:5) (encoding an Env common region); the sequence presented as Figure 17 
(SEQ ID NO:46) (encoding a 97 nucleotide long Env common region); SEQ ID NO:47 
(encoding a 144 nucleotide long Env common region); nucleotides 82-1512 of Figure 3 (SEQ 
ID NO:6) (encoding a gpl20 polypeptide); nucleotides 82-2025 of Figure 3 (SEQ ID NO:7) 
(encoding a gpl40 polypeptide); nucleotides 82-2547 of Figure 3 (SEQ ID NO:8) (encoding a 
gpl60 polypeptide); SEQ ID NO:49 (encoding a gpl60 polypeptide); nucleotides 1-2547 of 
Figure 3 (SEQ ID NO:9) (encoding a gpl60 polypeptide with signal sequence); nucleotides 
1513-2547 of Figure 3 (SEQ ID NO:10) (encoding a gp41 polypeptide); nucleotides 1210- 
1353 of Figure 4 (SEQ ID NO: 11) (encoding an Env common region); nucleotides 73-1509 
of Figure 4 (SEQ ID NO:12) (encoding a gpl20 polypeptide); nucleotides 73-2022 of Figure 
4 (SEQ ID NO:13) (encoding a gpl40 polypeptide); nucleotides 73-2565 of Figure 4 (SEQ 
ID NO:14) (encoding a gpl60 polypeptide); nucleotides 1-2565 of Figure 4 (SEQ ID NO:15) 
(encoding a gpl60 polypeptide with signal sequence); the sequence presented as Figure 20 
(SEQ ED NO:49) (encoding a gpl60 polypeptide); the sequence presented as Figure 68 (SEQ 
ID NO:97) (encoding a gpl60 polypeptide); nucleotides 1510-2565 of Figure 4 (SEQ ID 
NO:16) (encoding a gp41 polypeptide); nucleotides 7 to 1464 of Figure 90 (SEQ ID NO:l 19) 
(encoding a gpl20 polypeptide with modified wild type signal sequence); nucleotides 7 to 
1977 of Figure 91 (SEQ ID NO:120) (encoding a gpl40 polypeptide including signal 
sequence modified from wild-type 8J2_TV1_C.ZA {e.g., "modified wild type leader, 
sequence")); nucleotides 7 to 1977 of Figure 92 (SEQ ID NO:121) (encoding a gpl40 
polypeptide with modified wild type 8_2_TV1_C.ZA signal sequence); nucleotides 7 to 2388 
of Figure 93 (SEQ ID NO:122) (encoding a gpl60 polypeptide with modified wild type 
signal sequence); nucleotides 7 to 2520 of Figure 94 (SEQ ID NO:123) (encoding a gpl60 
polypeptide with modified wild type 8 J2_TV1_C.ZA signal sequence); nucleotides 7 to 2520 
of Figure 95 (SEQ ED NO:124) (encoding a gpl60 polypeptide with modified wild type 
8__2_TV1_C.ZA signal sequence); nucleotides 13 to 2604 of Figure 96 (SEQ ID NO:125) 
(encoding a gpl60 polypeptide with TPA1 signal sequence); nucleotides 7. to 2607 of Figure 
97 (SEQ ID NO: 126) (encoding a gpl60 polypeptide with modified wild type 

8 
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8_2_TV1_C.ZA signal sequence); nucleotides 1 to 2049 of Figure 100 (SEQ ID NO:131) 
(encoding a gpl40 polypeptide with TPA1 signal sequence); nucleotides 7 to 1607 of Figure 
98 (SEQ ID NO:126) (encoding a gpl60 polypeptide with wild type 8_2JTV1_C.ZA signal 
sequence); nucleotides 7 to 2064 of SEQ ED NO: 132 (encoding a gpl40 polypeptide with 
5 modified wild-type 8_2JTV1_C.ZA leader sequence); and nucleotides 7 to 2064 of SEQ ID 
NO:133 (encoding a gpl40 polypeptide with wild-type 8_2JTV1_C.ZA leader sequence). 

Exemplary purified polynucleotides encoding additional HIV polynucleotides 
include: Pol-encoding polynucleotides (e.g., SEQ ID NO:30, SEQ ID NO:31; SEQ ID 
NO:32; SEQ ID NO:62; SEQ ID NO: 103; SEQ ID NO:58; SEQ ID NO:60; SEQ ID NO:64; 

10 SEQ ID NO:66; SEQ ID NO:68; SEQ ID NO:70; SEQ ID NO:76; and SEQ ID NO:78); Nef- 
encoding polynucleotides (e.g., SEQ ID NO:55; SEQ ID NO:101; SEQ ID NO:57; SEQ ID 
NO:96); Rev-encoding polynucleotides (e.g., SEQ ID NO:72; SEQ ID NO: 105; SEQ ID 
NO:74); SEQ ID NO:107; SEQ ID NO:91); Tat-encoding polynucleotides (e.g., SEQ ID 
NO:80; SEQ ID NO:109; SEQ ID NO:81; SEQ ID NO:83; SEQ ID NOrlll); Vif-encoding 

15 polynucleotides (e.g f SEQ ID NO:85; SEQ ID NO:113); and Vpr-encoding polynucleotides 
(e.£.,SEQIDNO:87; SEQIDNO:115); Vpu-encoding polynucleotides (e.g., SEQ ID NO: 89; 
SEQIDNO:117). 

In other embodiments, the present invention relates to native HIV polypeptide- 
encoding sequences obtained from novel Type C strains; fragments of these native sequences; 

20 expression cassettes containing these wild-type sequences; and uses of these sequences, 

fragments and expression cassettes. Exemplary full length sequences are shown in SEQ ID 
NO:33 and SEQ ID NO:45. Exemplary fragments coding for various HIV gene products 
include: the sequence presented in Figure 19 (SEQ ID NO:48) (an Env-encoding sequence); 
the sequence presented in Figure 69 (SEQ ID NO:98) (an Env-encoding sequence); the 

25 sequence presented in Figure 21 (SEQ ID NO:50) (a gpl60 polypeptide); the sequence 
presented in Figure 23 (SEQ ID NO:52) (a Gag polypeptide); the sequence presented in 
Figure 71 (SEQ ID NO:100) (a Gag polypeptide); the sequence presented in Figure 25 (SEQ 
ID NO:54) (a Gag polypeptide); the sequence presented in Figure 27 (SEQ ID NO:56) (a Nef 
polypeptide); the sequence presented in Figure 73 (SEQ ID NO:102) (a Nef polypeptide); the 

30 sequence presented in Figure 30 (SEQ ID NO:59) (a P 15RNAseH polypeptide); the sequence 
presented in Figure 32 (SEQ ID NO:61) (ap31Integrase polypeptide); the sequence presented 
in Figure 34 (SEQ ID NO:63) (a Pol polypeptide); the sequence presented in Figure 75 (SEQ 
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ID NO:104) (a Pol polypeptide); the sequence presented in Figure 36 (SEQ ID NO:65) (a 
Prot polypeptide); the sequence presented in Figure 38 (SEQ ID NO:67) (a inactivated Prot 
polypeptide); the sequence presented in Figure 40 (SEQ ID NO:69) (an inactivated Prot and 
RT polypeptide); the sequence presented in Figure 42 (SEQ ID NO:71) (a Prot and RT 
polypeptide); the sequence presented in Figure 44 (SEQ ID NO:73) (a Rev polypeptide); the 
sequence presented in Figure 77 (SEQ ID NO: 106) (a Rev polypeptide); the sequence 
presented in Figure 46 (SEQ ED NO:75) (a Rev polypeptide); the sequence presented in 
Figure 79 (SEQ ID NO: 108) (a Rev polypeptide); the sequence presented in Figure 48 (SEQ 
ED NO:77) (an RT polypeptide); the sequence presented in Figure 50 (SEQ ED NO:79) (a 
mutated RT polypeptide); the sequence presented in Figure 53 (SEQ EDNO:82) (a Tat 
polypeptide); the sequence presented in Figure 81 (SEQ ID NO:l 10) (a Tat polypeptide); the 
sequence presented in Figure 55 (SEQ ID NO:84) (a Tat polypeptide); the sequence presented 
in Figure 83 (SEQ ID NO: 1 12) (a Tat polypeptide); the sequence presented in Figure 57 
(SEQ ED NO:86) (a Vif polypeptide); the sequence presented in Figure 85 (SEQ ID NO:l 14) 
(a Vif polypeptide); the sequence presented in Figure 59 (SEQ ID NO:88) (a Vpr 
polypeptide); the sequence presented in Figure 82 (SEQ ID NO: 116) (a Vpr polypeptide); the 
sequence presented in Figure 61 (SEQ ID NO:90) (a Vpu polypeptide); the sequence 
presented in Figure 89 (SEQ ID NO:l 18) (a Vpu polypeptide); the sequence presented in 
Figure 63 (SEQ ID NO:92) (a Rev polypeptide); and the sequence presented in Figure 66 
(SEQ ID NO:95) (a Tat polypeptide). 

The native and synthetic polynucleotide sequences encoding the HIV polypeptides of 
the present invention typically have at least about 85%, preferably about 90%, more 
preferably about 95%, and most preferably about 98% sequence identity to the sequences 
taught herein. Further, in certain embodiments, the polynucleotide sequences encoding the 
HIV polypeptides of the invention will exhibit 100% sequence identity to the sequences 
taught herein. 

The polynucleotides of the present invention can be produced by recombinant 
techniques, synthetic techniques, or combinations thereof. 

The present invention further includes recombinant expression systems for use in 
selected host cells, wherein the recombinant expression systems employ one or more of the 
polynucleotides and expression cassettes of the present invention. In such systems, the 
polynucleotide sequences are operably linked to control elements compatible with expression 
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in the selected host cell. Numerous expression control elements are known to those in the^art, 
including, but not limited to, the following: transcription promoters, transcription enhancer 
elements, transcription termination signals, polyadenylation sequences, sequences for 
optimization of initiation of translation, and translation termination sequences. Exemplary 
5 transcription promoters include, but are not limited to those derived from CMV, CMV+intron 
A, SV40, RSV, HIV-Ltr, MMLV-ltr, and metallothionein. 

In another aspect the invention includes cells comprising one or more of the 
.expression cassettes of the present invention where the polynucleotide sequences are operably 
linked to control elements compatible with expression in the selected cell. In one 

10 embodiment such cells are mammalian cells. Exemplary mammalian cells include, but are 
not limited to, BHK, VERO, HT1080, 293, RD, COS-7, and CHO cells. Other cells, cell 
types, tissue types, etc., that may be useful in the practice of the present invention include, 
but are not limited to, those obtained from the following: insects (e.g., Trichoplusia ni (Tn5) 
and Sf9), bacteria, yeast, plants, antigen presenting cells (e.g., macrophage, monocytes, 

1 5 dendritic cells, B-cells, T-cells, stem cells, and progenitor cells thereof), primary cells, 
immortalized cells, tumor-derived cells. 

In a further aspect, the present invention includes compositions for generating an 
immunological response, where the composition typically comprises at least one of the 
expression cassettes of the present invention and may, for example, contain combinations of 

20 expression cassettes (such as one or more expression cassettes carrying a Pol-polypeptide- 
encoding polynucleotide, one or more expression cassettes carrying a Gag-polypeptide- 
encoding polynucleotide, one or more expression cassettes carrying accessory polypeptide- 
encoding polynucleotides (e.g., native or synthetic vpu, vpr, nef, vif, tat, rev), and/or one or 
more expression cassettes carrying an Env-polypeptide-encoding polynucleotide). Such 

25 compositions may further contain an adjuvant or adjuvants. The compositions may also 
contain one or more Type C HIV polypeptides. The Type C HIV polypetpides may 
correspond to the polypeptides encoded by the expression cassette(s) in the composition, or 
may be different from those encoded by the expression cassettes. An example of the 
polynucleotide in the expression cassette encoding the same polypeptide as is being provided 

30 in the composition is as follows: the polynucleotide in the expression cassette encodes the 
. Gag-polypeptide of Figure 1 (SEQ ED NO:3), and the polypeptide (SEQ ID NO:17) is the 
polypeptide encoded by the sequence shown in Figure 1 . An example of the polynucleotide in 

11 
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the expression cassette encoding a different polypeptide as is being provided in the 
composition is as follows: an expression cassette having a polynucleotide encoding a Gag- 
polymerase polypeptide, and the polypeptide provided in the composition may be a Gag 
and/or Gag-protease polypeptide. In compositions containing both expression cassettes (or 
polynucleotides of the present invention) and polypeptides, various expression cassettes of 
the present invention can be mixed and/or matched with various Type C HIV polypeptides 
described herein. 

In another aspect the present invention includes methods of immunization of a 
subject. In the method any of the above described compositions are into the subject under 
conditions that are compatible with expression of the expression cassette(s) in the subject. In 
one embodiment, the expression cassettes (or polynucleotides of the present invention) can be 
introduced using a gene delivery vector. The gene delivery vector can, for example, be a 
non-viral vector or a viral vector. Exemplary viral vectors include, but are not limited to 
Sindbis-virus derived vectors, retroviral vectors, and lentiviral vectors. Compositions useful 
for generating an immunological response can also be delivered using a particulate carrier. 
Further, such compositions can be coated on, for example, gold or tungsten particles and the 
coated particles delivered to the subject using, for example, a gene gun. The compositions 
can also be formulated as liposomes. In one embodiment of this method, the subject is a 
mammal and can, for example, be a human. 

In a further aspect, the invention includes methods of generating an immune response 
in a subject. Any of the expression cassettes described herein can be expressed in a suitable 
cell to provide for the expression of the Type C HIV polypeptides encoded by the 
polynucleotides of the present invention. The polypeptide(s) are then isolated (e.g., 
substantially purified) and administered to the subject in an amount sufficient to elicit an 
immune response. In certain embodiments, the methods comprise administration of one or 
more of the expression cassettes or polynucleotides of the present invention, using any of the 
gene delivery techniques described herein. In other embodiments, the methods comprise co- 
administration of one or more of the expression cassettes or polynucleotides of the present 
invention and one or more polypeptides, wherein the polypeptides can be expressed from 
these polynucleotides or can be other subtype C HTV polypeptides. In other embodiments, 
the methods comprise co-administration of multiple expression cassettes or polynucleotides 
of the present invention. In still further embodiments, the methods comprise co- 
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administration of multiple polypeptides, for example polypeptides expressed from the 
polynucleotides of the present invention and/or other subtype C HIV polypeptides. 

The invention further includes methods of generating an immune response in a 
subject, where cells of a subject are transfected with any of the above-described expression 
cassettes or polynucleotides of the present invention, under conditions that permit the 
expression of a selected polynucleotide and production of a polypeptide of interest (e.g., 
encoded by any expression cassette of the present invention). By this method an 
immunological response to the polypeptide is elicited in the subject. Transfection of the cells 
may be performed ex vivo and the transfected cells are reintroduced into the subject. 
Alternately, or in addition, the cells may be transfected in vivo in the subject. The immune 
response may be humoral and/or cell-mediated (cellular). In a further embodiment, this 
method may also include administration of an Type C HIV polypeptides before, concurrently 
with, and/or after introduction of the expression cassette into the subject. 

These and other embodiments of the present invention will readily occur to those of 
ordinary skill in the art in view of the disclosure herein. 

Brief Description of the Figures 

Figure 1 (SEQ ID NO:3) shows the nucleotide sequence of a polynucleotide encoding 
a synthetic Gag polypeptide. The nucleotide sequence shown was obtained by modifying 
type C strain AF110965 and include further modifications of INS. 

Figure 2 (SEQ ID NO: 4) shows the nucleotide sequence of a polynucleotide encoding 
a synthetic Gag polypeptide. The nucleotide sequence shown was obtained by modifying 
type C strain AF1 10967 and include further modifications of INS. . 

Figure 3 (SEQ ID NO: 9) shows the nucleotide sequence of a polynucleotide encoding 
a synthetic Env polypeptide. The nucleotide sequence depicts gpl60 (including a signal 
peptide) and was obtained by modifying type C strain AF1 10968. The arrows indicate the 
positions of various regions of the polynucleotide, including the sequence encoding a signal 
peptide (nucleotides 1-81) (SEQ ID NO: 18), a gpl20 polypeptide (nucleotides 82-1512) 
(SEQ ID NO:6), a gp41 polypeptide (nucleotides 1513-2547) (SEQ ID NO:10), a gpl40 
polypeptide (nucleotides 82-2025) (SEQ ID NO:7) and a gpl60 polypeptide (nucleotides 82- 
2547) (SEQ ID NO:8). The codons encoding the signal peptide are modified (as described 
herein) from the native HIV-1 signal sequence. 

13 
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Figure 4 (SEQ ID NO: 15) shows the nucleotide sequence of a polynucleotide 
encoding a synthetic Env polypeptide. The nucleotide sequence depicts gpl60 (including a 
signal peptide) and was obtained by modifying type C strain AF1 10975. The arrows indicate 
the positions of various regions of the polynucleotide, including the sequence encoding a 
5 signal peptide (nucleotides 1-72) (SEQ ID NO:19), a gpl20 polypeptide (nucleotides 73- 
1509) (SEQ ID NO:12), a gp41 polypeptide (nucleotides 1510-2565) (SEQ ID NO:16), a 
gpl40 polypeptide (nucleotides 73-2022) (SEQ ID NO:13), and a gp!60 polypeptide 
(nucleotides 73-2565) (SEQ ID NO:14). The codons encoding the signal peptide are 
modified (as described herein) from the native HTV-1 signal sequence. 

10 Figure 5 shows the location of some remaining INS in synthetic Gag sequences 

derived from AF 11 0965. The changes made to these sequences are boxed in the Figures. 
The top line depicts a codon modified sequence of Gag polypeptides from the indicated 
strains (SEQ ID NO:20). The nucleotide(s) appearing below the line in the boxed region(s) 
depicts changes made to remove further INS and correspond to the sequence depicted in 

15 Figure 1 (SEQ ID NO:3). 

Figure 6 shows the location of some remaining INS in synthetic Gag sequences 
derived from AF1 10967. The changes made to these sequences are boxed in the Figures. 
The top line depicts a modified sequence of Gag polypeptides from the indicated strains 
(SEQ ID NO:21). The nucleotide(s) appearing below the line in the boxed region(s) depicts 

20 changes made to remove further INS and correspond to the sequence depicted in Figure 2 
(SEQIDNO:4). 

Figure 7 is a schematic depicting the selected domains in the Pol region of HIV. 

Figure 8 (SEQ ID NO:30) depicts the nucleotide sequence of the synthetic construct 
designated PR975(+). "(+)" indicates that the reverse transcriptase is functional. This 
25 construct includes sequence from p2 (nucleotides 16 to 54 of SEQ ID NO:30); p7 

(nucleotides 55 to 219 of SEQ ID NO:30); pl/p6 (nucleotides 220-375 of SEQ ID NO:30); 
prot (nucleotides 376 to 672 of SEQ ID NO:30), reverse transcriptase (nucleotides 673 to 
2352 of SEQ ID NO:30); and 6 amino acids of integrase shown in Figure 7 (nucleotides 2353 
to 2370 of SEQ ID NO:30). In addition, the construct contains a multiple cloning site (MCS, 
30 nucleotides 2425 to 2463 of SEQ ID NO:30) for insertion of a transgene and a YMDD 
epitope cassette (nucleotides 2371 to 2424 of SEQ ID NO:30). 
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Figure 9 (SEQ ID NO:3 1) depicts the nucleotide sequence of the synthetic construct 
designated PR975YM. As illustrated in Figure 7,. the RT region includes a mutation in the 
catalytic center (mut. cat. center). "YM" refers to constructs in which the nucleotides encode 
the amino acids AP instead of YMDD in this region. Reverse transcriptase is not functional 
5 in this construct. This construct includes sequence from the p2 (nucleotides 16 to 54 of SEQ 
ID NO:31); p7 (nucleotides 55 to 219 of SEQ ID NO:31); pl/p6 (nucleotides 220 to 375 of 
SEQ ID NO:31); prot (nucleotides 376 to 672 of SEQ ID NO:31); and reverse transcriptase 
(nucleotides 673 to 2346 of SEQ ID NO:31) shown in Figure 7, although the reverse 
transcriptase protein is not functional. In addition, the construct contains a multiple cloning 
10 site (MCS, nucleotides 2419 to 2457 of SEQ ID NO:31) for insertion of a transgene and a 
YMDD epitope cassette (nucleotides 2365 to 241 8 of SEQ ID NO:3 1). 

Figure 10 (SEQ ID NO:32) depicts the nucleotide sequence of the synthetic construct 
designated PR975 YMWM. "YM" - refers to constructs in which the nucleotides encode the 
amino acids AP instead of YMDD in this region. "WM" refers to constructs in which the 
1 5 nucleotides encode amino acids PI instead of WMGY in this region. This construct includes 
sequence from the p2 (nucleotides 16 to 54 of SEQ ID NO:32); p7 (nucleotides 55 to 219 of 
SEQ ID NO:32); pl/p6 (nucleotides 220 to 375 of SEQ ID NO:32); prot (nucleotides 376 to 
672 of SEQ ID NO:32); and reverse transcriptase (nucleotides 673 to 2340 of SEQ ID 
NO:32) shown in Figure 7, although the reverse transcriptase protein is not functional In 
20 addition, the construct contains a multiple cloning site (MCS, nucleotides 2413 to 245 1 of 
SEQ ID NO:32) for insertion of a transgene and a YMDD epitope cassette (nucleotides 2359 
to2412ofSEQIDNO:32). 

Figure 1 1 (SEQ ID NO:33) depicts the nucleotide sequence of 8_5_TV1_C.ZA. 
Various regions are shown in Table A. 
25 Figure 12 (SEQ ID NO:34) depicts/the wild type nucleotide sequence of AF1 10975 

Pol from p2gag until p7gag. 

Figure 13 (SEQ ID NO:35) depicts the wild type nucleotide sequence of AF1 10975 
Pol from pi through the first 6 amino acids of the integrase protein. 

Figure 14 (SEQ ID NO:36) depicts the nucleotide sequence of a cassette encoding 
30 Ilel78 through Serine 191 of reverse transcriptase. 

Figure 15 (SEQ ID NO:37) shows amino acid sequence which includes an epitope in 
the region of the catalytic center of the reverse transcriptase protein. 
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Figure 16 (SEQ ID NO:45) depicts the nucleotide sequence of 12-5_1 JTV2_C.ZA. 

Figure 17 (SEQ ID NO:46) depicts the nucleotide sequence of a synthetic Env- 
encoding polynucleotide derived from S_5JTV1_C.ZA. The sequence corresponds to a short 
(97 base pair) common region. 

Figure 18 (SEQ ID NO:47) depicts the nucleotide sequence of a synthetic Env- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. The sequence corresponds to a 
common region in Env. 

Figure 19 (SEQ ID NO:48) depicts the wild-type nucleotide sequence of 
8_5_TVl_C.ZAEnv. 

Figure 20 (SEQ ID NO:49) depicts the nucleotide sequence of a synthetic Env gpl60- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 

■ Figure 21 (SEQ ID NO.50) depicts the wild-type nucleotide sequence of 
8_5_TV1_C.ZA Env gpl60. 

Figure 22 (SEQ ID NO:51) depicts the nucleotide sequence of a synthetic Gag- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 

Figure 23 (SEQ ID NO:52) depicts the wild-type nucleotide sequence of 
8_5_TV1_C.ZA Gag. 

Figure 24 (SEQ ID NO:53) depicts the nucleotide sequence of a synthetic Gag- 
encoding polynucleotide (major homology region) derived from 8_5_TV1_C.ZA. 

Figure 25 (SEQ ID NO:54) depicts the wild-type nucleotide sequence of 
8_5_TV1_C.ZA Gag major homology region. 

Figure 26 (SEQ ID NO: 5 5) depicts the nucleotide sequence of a synthetic Nef- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 

Figure 27 (SEQ ID NO: 5 6) depicts the wild-type nucleotide sequence of 
8_5_TVl_C.ZANef. 

Figure 28 (SEQ ID NO:57) depicts the nucleotide sequence of a synthetic Nef- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. The sequence includes a mutation at 
position 125 which results in a non-functional gene product. . 

Figure 29 (SEQ ID NO:58) depicts the nucleotide sequence of a synthetic RNAseH- 
encoding polynucleotide derived from 8_5 JTV1_C.ZA. RnaseH is a functional domain of 
the Pol gene, corresponding to pi 5 (Table A). 
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Figure 30 (SEQ ID NO:59) depicts the wild-type nucleotide sequence of 
8^5__TVl_C.ZARNAseH. 

Figure 31 (SEQ ID NO:60) depicts the nucleotide sequence of a synthetic integrase 
(Int)-encoding polynucleotide derived from 8_5_TV1_C.ZA. Int is a functional domain of 
5 the Pol gene, corresponding to p3 1 (Table A). 

Figure 32 (SEQ ID NO:61) depicts the wild-type nucleotide sequence of 
8_5JTVl_C.ZAInt. 

Figure 33 (SEQ ID NO: 62) depicts the nucleotide sequence of a synthetic Pol- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 
1 0 Figure 34 (SEQ ID NO:63) depicts the wild-type nucleotide sequence of . 

8_5_TVl__C.ZAPol. 

Figure 35 (SEQ ID NO:64) depicts the nucleotide sequence of a synthetic protease 
(prot)-encoding polynucleotide derived from 8 5 TV1C.ZA. 

Figure 36 (SEQ ID NO:65) depicts the wild-type nucleotide sequence of 
15 8_5_TVlj:.ZAProt 

Figure 37 (SEQ ID NO:66) depicts the nucleotide sequence of a synthetic protease 
(prot)-encoding polynucleotide derived from 8_5 JTV1_C.ZA containing a mutation in which 
results in inactivation of the protease. 

Figure 38 (SEQ ID NO:67) depicts the wild-type nucleotide sequence of 
20 8_5JTV1_C.ZA inactivated Prot. 

Figure 39 (SEQ ID NO: 68) depicts the nucleotide sequence of a synthetic protease 
(prot)-encoding polynucleotide and a synthetic reverse transcriptase (RT)-encoding 
polynucleotide, both derived from 8_5 JTV1_C.ZA. The Prot and RT sequences both contain 
a mutation which results in inactivation of the gene product. 
25 Figure 40 (SEQ ID NO:69) depicts the wild-type nucleotide sequence of 

8_5jrVl_C.ZA inactivated Prot/mutated RT. 

Figure 41 (SEQ ID NO:70) depicts the nucleotide sequence of a synthetic protease 
(prot)-encoding polynucleotide and a synthetic reverse transcriptase (RT)-encoding 
polynucleotide, both derived from 8_5_TV1_C.ZA. 
30 Figure 42 (SEQ ID NO:71) depicts the wild-type nucleotide sequence of 

8 5 TV1_C.ZA Prot and RT. 
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Figure 43 (SEQ ID NO:72) depicts the nucleotide sequence of a synthetic rev- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. The synthetic sequence depicted 
corresponds to exon 1 of rev. Wild-type rev has two exons. 

Figure 44 (SEQ ID NO:73) depicts the wild-type nucleotide sequence of 
5 8_5_TV1_C.ZA exon 1 of Rev. 

Figure 45 (SEQ ID NO:74) depicts the nucleotide sequence of a synthetic rev- 
encoding polynucleotide derived from 8_5_TV1_CZA. The synthetic sequence depicted 
corresponds to exon 2 of rev. 

Figure 46 (SEQ ID NO:75) depicts the wild-type nucleotide sequence of 
10 8_5_TV1_C.ZA exon 2 of Rev. 

Figure 47 (SEQ ID NO:76) depicts the nucleotide sequence of a synthetic RT- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 

Figure 48 (SEQ ID NO:77) depicts the wild-type nucleotide sequence of 
8_5_TV1_C.ZART. 

1 5 Figure 49 (SEQ ID NO:78) depicts the nucleotide sequence of a synthetic RT- 

encoding polynucleotide derived from 8_5_TV1_C.ZA. The synthetic polynucleotide 
includes a mutation in the RT coding sequence which renders the gene product inactive. 

Figure 50 (SEQ ID NO:79) depicts the wild-type nucleotide sequence of 
8 5_TV1_C.ZA RT including a mutation which inactivates the RT gene product. 
20 Figure 51 (SEQ ID NO:80) depicts the nucleotide sequence of a synthetic Tat- 

encoding polynucleotide derived from 8_5_TV1_C.ZA. The synthetic sequence depicted 
corresponds to exon 1 of Tat and further includes a mutation that renders the Tat gene 
product non-functional. Wild-type Tat has two exons. 

Figure 52 (SEQ ID NO:81) depicts the nucleotide sequence of a synthetic Tat- 
25 encoding polynucleotide derived from 8_5JTV1_C.ZA. The synthetic sequence depicted 
corresponds to exon 1 of Tat. 

Figure 53 (SEQ ID NO:82) depicts the wild-type nucleotide sequence of 
8__5_TVl_C.ZAexonlofTat 

Figure 54 (SEQ ID NO:83) depicts the nucleotide sequence of a synthetic Tat- 
30 encoding polynucleotide derived from 8_5JTV1_C.ZA. The synthetic sequence depicted 
corresponds to exon 2 of Tat. 
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Figure 55 (SEQ ID NO: 84) depicts the wild-type nucleotide sequence of 
8_5JTVl_C.ZAexon2ofTat 

Figure 56 (SEQ ID NO:85) depicts the nucleotide sequence of a synthetic Vif- 
encoding polynucleotide derived from 8_5_TV1_C.ZA. 
5 Figure 57 (SEQ ID NO: 8 6) depicts the wild-type nucleotide sequence of 

8_5JTVl_C.ZAVi£ 

Figure 58 (SEQ ID NO:87) depicts the nucleotide sequence of a synthetic Vpr- 
encoding polynucleotide derived from 8_5 JTV1_C.ZA. 

Figure 59 (SEQ ID NO: 88) depicts the wild-type nucleotide sequence of 
10 8_5JTVl_C.ZAVpr. 

Figure 60 (SEQ ID NO:89) depicts the nucleotide sequence of a synthetic Vpu- 
encoding polynucleotide derived from 85_TV1_C.ZA. 

Figure 61 (SEQ ID NO:90) depicts the wild-type nucleotide sequence of 
8_5jrVl_C.ZAVpu. 

15 Figure 62 (SEQ ID NO:91) depicts the nucleotide sequence of a synthetic rev- 

encoding polynucleotide derived from 8__5 JTV1_C.ZA. The synthetic sequence depicted 
corresponds to exons 1 and 2 of rev. 

Figure 63 (SEQ ID NO:92) depicts the wild-type nucleotide sequence of exons 1 and 
2 of rev derived from 8_5 JTV1_C.ZA. 
20 Figure 64 (SEQ ID NO:93) depicts the nucleotide sequence of a synthetic Tat- 

encoding polynucleotide derived from 8_5_TV1_C.ZA. The synthetic polynucleotide 
includes both exons 1 and 2 of Tat and further includes a mutation in exon 1 which renders 
the gene product non-functional. 

Figure 65 (SEQ ID NO:94) depicts the nucleotide sequence of a synthetic Tat- 
25 encoding polynucleotide derived from 8_5_TV1_C.ZA. The synthetic polynucleotide 
includes both exons 1 and 2 of Tat. 

Figure 66 (SEQ ID NO: 95) depicts the wild-type nucleotide sequence of exons 1 and 
2 of Tat derived from 8_5JTVl j:.ZA. 

Figure 67 (SEQ ID NO:96) depicts the nucleotide sequence of a synthetic Nef- 
30 encoding polynucleotide derived from 8_5JTV1_C.ZA. The sequence includes a mutation at 
position 125 which results in a non-functional gene product and a mutation that eliminates the 
myristoylation site of the Nef gene product. 
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Figure 68 (SEQ ID NO:97) depicts the nucleotide sequence of a synthetic Env gpl60- 
encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 69 (SEQ ID NO:98) depicts the wild-type nucleotide sequence of Env gpl60 
derived from 12-5_1_TV2_C.ZA. 

Figure 70 (SEQ ID NO:99) depicts the nucleotide sequence of a synthetic Gag- 
encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 71 (SEQ ID NO: 100) depicts the wild-type nucleotide sequence of Gag 
derived from 12-5_1_TV2_C.ZA. 

Figure 72 (SEQ ID NO:101) depicts the nucleotide sequence of a synthetic Nef- 
encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 73 (SEQ ID NO:102) depicts the wild-type nucleotide sequence of Nef derived 

from 12-5_1_TV2_C.ZA. 

Figure 74 (SEQ ID NO: 103) depicts the nucleotide sequence of a synthetic Pol- 
encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 75 (SEQ ID NO: 104) depicts the wild-type nucleotide sequence of Pol derived 

from 12-5 J JTV2_C.ZA. 

Figure 76 (SEQ ID NO:105) depicts the nucleotide sequence of a synthetic Rev- 
encoding polynucleotide derived from exon 1 of Rev from 12-5_1_TV2_C.ZA. 

Figure 77 (SEQ ED NO:106) depicts the wild-type nucleotide sequence of exon 1 of 
Rev derived from 12-5 J_TV2j:.ZA. 

Figure 78 (SEQ ID NO: 107) depicts the nucleotide sequence of a synthetic Rev- 
encoding polynucleotide derived from exon 2 of Rev from 12-5_1_TV2_C.ZA. 

Figure 79 (SEQ ID NO : 108) depicts the wild-type nucleotide sequence of exon 2 of 
Rev derived from 12-5 J JTV2_C.ZA. 

Figure 80 (SEQ ID NO: 109) depicts the nucleotide sequence of a synthetic Tat- 
encoding polynucleotide derived from exon 1 of Tat from 12-5_1_TV2_C.ZA. 

Figure 81 (SEQ ID NO:110) depicts the wild-type nucleotide sequence of exon 1 of 
Tat derived from 12-5_1 JTV2_C.ZA. 

Figure 82 (SEQ ID NO: 111) depicts the nucleotide sequence of a synthetic Tat- 
encoding polynucleotide derived from exon 2 of Tat from 12-5_1_TV2_C.ZA. 

Figure 83 (SEQ ID NO: 1 12) depicts the wild-type nucleotide sequence of exon 2 of 
Tat derived from 12-5_1_TV2_C.ZA. 
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Figure 84 (SEQ ID NO: 11 3) depicts the nucleotide sequence of a synthetic Vif- 
encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 85 (SEQ ID NO:l 14) depicts the wild-type nucleotide sequence of Vif derived 
from 12-5_1_TV2_C.ZA. 
5 Figure 86 (SEQ ID NO: 115) depicts the nucleotide sequence of a synthetic Vpr- 

encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 87 (SEQ ID NO: 116) depicts the wild-type nucleotide sequence of Vpr derived 
from 12-5_1_TV2_C.ZA. 

. Figure 88 (SEQ ID NO : 1 1 7) depicts the nucleotide sequence of a synthetic Vpu- 
10 encoding polynucleotide derived from 12-5_1_TV2_C.ZA. 

Figure 89 (SEQ ID NO: 1 18) depicts the wild-type nucleotide sequence of Vpu 
derived from 12-5_1 JTV2J1ZA. 

Figure 90 (SEQ ID NO:119) depicts the nucleotide sequence of a synthetic Env 
gpl20-encoding polynucleotide derived from 8__2_TV1_C.ZA. The V2 region is deleted. 
15 The sequence includes: an EcoRI restriction site (nucleotides 1 to 6); a codon modified signal 
peptide leader sequence (nucleotides 7 to 87); a gpl20 coding sequence (nucleotides 88 to 
1464); a stop codon (nucleotides 1465 to 1467); an Xhol restriction site (nucleotides 1468 to 
1473). 

Figure 91 (SEQ ID NO:120) depicts the nucleotide sequence of a synthetic Env 
20 gp 140-encoding polynucleotide derived from 8_2_TV1_C.ZA. The V2 region is deleted. 
The sequence includes: an EcoRI restriction site (nucleotides 1 to 6); a modified signal 
peptide leader sequence (nucleotides 7 to 87); a gpl40 coding sequence (nucleotides 88 to . 
1977); a stop codon (nucleotides 1978 to 1980); an Xhol restriction site (nucleotides 1981 to 
1986). 

25 Figure 92 (SEQ ID NO: 121) depicts the nucleotide sequence of a synthetic Env 

gpl40-encoding polynucleotide derived from 8_2_TV1_C.ZA. The V2 region is deleted and 
the sequence includes mutations in the cleavage site that prevent the cleavage of a gpl40 ; 
polypeptide into a gpl20 polypeptide and a gp41 polypeptide. The sequence includes: an 
EcoRI restriction site (nucleotides 1 to 6); a modified signal peptide leader sequence 

30 (nucleotides 7 to 87); gpl40 coding sequence (nucleotides 88 to 1977); a stop codon 
(nucleotides 1978 to 1980); an Xhol restriction site (nucleotides 1981 to 1986). 
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Figure 93 (SEQ ED NO: 122) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8_2_TV1_C.ZA. The V1/V2 regions are 
deleted. The sequence includes: an EcoRI restriction site (nucleotides 1 to 6); a modified 
signal peptide leader sequence (nucleotides 7 to 87); gpl60 coding sequence (nucleotides 88 
to 2388); a stop codon (nucleotides 2389 to 2391); an Xhol restriction site (nucleotides 2392 
to 2397). 

Figure 94 (SEQ ID NO: 123) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8_2JTV1_C.ZA. The V2 region is deleted. 
The sequence includes: an EcoRI restriction site (nucleotides 1 to 6); a modified signal 
peptide leader sequence (nucleotides 7 to 87); a gpl60 coding sequence (nucleotides 88 to 
2520); a stop codon (nucleotides 2521 to 2523); an Xhol restriction site (nucleotides 2524 to 
2529). 

Figure 95 (SEQ ID NO: 124) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8_2 JTVl^C.ZA. The V2 region is deleted and 
the cleavage site is mutated. The sequence includes: an EcoRI restriction site (nucleotides 1 
to 6); a modified signal peptide leader sequence (nucleotides 7 to 87); a gpl60 coding 
sequence (nucleotides 88 to 2520); a stop codon (nucleotides 2521 to 2523); an Xhol 
restriction site (nucleotides 2524 to 2529). 

Figure 96 (SEQ ID NO:125) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8_2_TV1_C.ZA. The nucleotide sequence 
includes a TPA1 leader sequence. The sequence includes: a Sail restriction site (nucleotides 
1 to 6); aKozak sequence (nucleotides 7 to 12); a TPA1 signal peptide leader sequence 
(nucleotides 13 to 87); a gpl60 coding sequence (nucleotides 88 to 2604); a stop codon 
(nucleotides 2605 to 2607); an Xhol restriction site (nucleotides 2608 to 2613). 

Figure 97 (SEQ ID NO: 126) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8J2_TV1_C.ZA. The sequence includes: an 
EcoRI restriction site (nucleotides 1 to 6); a modified signal peptide leader sequence 
(nucleotides 7 to 87); a gpl60 coding sequence (nucleotides 8 to 2607); a stop codon 
(nucleotides 2608 to 2610); an Xhol restriction site (nucleotides 2611 to 2616). 

Figure 98 (SEQ ID NO: 127) depicts the nucleotide sequence of a synthetic Env 
gpl60-encoding polynucleotide derived from 8_2_TV1_C.ZA. The nucleotide sequence 
includes a wild type leader sequence. The sequence includes: an EcoRI restriction site 
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(nucleotides 1 to 6); a native (unmodified) signal peptide leader sequence (nucleotides 7 to 
87); a gpl60 coding sequence (nucleotides 88 to 2607); a stop codon (nucleotides 2608 to 
. 26 1 0); an Xhol restriction site (nucleotides 26 1 1 to 26 1 6). 

Figure 99 (SEQ ED NO:128) depicts the nucleotide sequence of wild type gpl60 

5 derived from 8_2_TV1_C.ZA. . 

Figure 1 00 (SEQ ID NO : 1 3 1 ) depicts the nucleotide sequence of a synthetic Env 
gpl40-encoding polynucleotide derived from 8_2_TV1_C.ZA. The nucleotide sequence 
includes a TPA1 leader sequence (nucleotides 1-75); a gpl40 coding sequence (nucleotides 
76 to 2049); a stop codon (nucleotides 2050 to 2052) 

10 Figure 101 (SEQ ID NO: 132) depicts the nucleotide sequence of a synthetic gpl40- 

encoding polynucleotide derived from 8_2JTV1_C.ZA. The nucleotide sequence includes an 
EcoRI restriction site (nucleotides 1 to 6); a leader sequence modified from the TV1_C.ZA 
wild-type leader sequence (nucleotides 7 to 87); a gpl40 coding sequence (nucleotides 88 to 
2064); a stop codon (nucleotides 2065 to 2067); a Xhol restriction site (nucleotides 2068 to 

15 2073). 

Figure 102 (SEQ ID NO:133) depicts the nucleotide sequence of a synthetic gpl40- 
encoding polynucleotide derived from 8JZ_TV1_C.ZA. The nucleotide sequence includes 
wild-type TV1_C.ZA unmodified leader sequence. The nucleotide sequence includes a *■ 
restriction site (nucleotides 1 to 6); a wild type leader sequence (nucleotides 7 to 87); a gpl40 

20 coding sequence (nucleotides 88 to 2064); a stop codon (nucleotides 2065 to 2067); a Xhol 
restriction site (nucleotides 2068-2073). 

Figure 103 (SEQ ID NO: 134) depicts the nucleotide sequence of a synthetic Nef- 
encoding polynucleotide derived from 12-5_J_TV2_C.ZA. The sequence includes a 
mutation at position 125 which results in a non-functional gene product. 

25 Figure 104 (SEQ ID NO:135) depicts the nucleotide sequence of a synthetic Nef- 

encoding polynucleotide derived from 12-5_1_TV2_C.ZA. The synthetic polynucleotide 
includes a mutation that eliminates the myristoylation site of the Nef gene product. 

Figure 105 depicts an alignment of Env polypeptides from various HIV isolates. The 
regions between the arrows indicate regions (of TV1 and TV2 clones) in the beta and/or 

30 bridging sheet region(s) that can be deleted and/or truncated. The "*" denotes N-linked 
glycosylation sites (of TV1 and TV2 clones), one or more of which can be modified {e.g., 
deleted and/or mutated). 
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Detailed Description of the Invention 

The practice of the present invention will employ, unless otherwise indicated, 

conventional methods of chemistry, biochemistry, molecular biology, immunology and 

pharmacology, within the skill of the art. Such techniques are explained fully in the 
5 literature. See, e.g., Remington 's Pharmaceutical Sciences, 1 8th Edition (Easton, 

Pennsylvania: Mack Publishing Company, 1990); Methods In Enzymology (S. Colowick and 

N. Kaplan, eds., Academic Press, Inc.); and Handbook of Experimental Immunology, Vols. 

I-IV (D.M. Weir and C.C. Blackwell, eds., 1986, Blackwell Scientific Publications); 

Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Short 
10 Protocols in Molecular Biology, 4th ed: (Ausubel et al. eds., 1999, John Wiley & Sons); 

Molecular Biology Techniques: An Intensive Laboratory Course, (Ream et al., eds., 1998, 

Academic Press); PCR (Introduction to Biotechniques Series), 2nd ed. (Newton & Graham 

eds., 1 997, Springer Verlag). 

As used in this specification and the appended claims, the singular forms "a," "an" 
15 and "the" include plural references unless the content clearly dictates otherwise. Thus, for 

example, reference to "an antigen" includes a mixture of two or more such agents. 

1. Definitions 

In describing the present invention, the following terms will be employed, and are 

20 intended to be defined as indicated below. 

"Synthetic" sequences, as used herein, refers to Type C HTV polypeptide-enaw/mg 
polynucleotides whose expression has been modified as described herein, for example,.by 
codon substitution and inactivation of inhibitory sequences. "Wild-type" or "native" 
sequences, as used herein, refers to polypeptide encoding sequences that are essentially as 

25 they are found in nature, e.g., Gag, Pol, Vif, Vpr, Tat, Rev, Vpu, Env and/or Nef encoding 
sequences as found in Type C isolates, e.g., API 10965, AF1 10967, AF1 10968, AF1 10975, 
8 5_TV1_C.ZA 8_2_TV1_C.ZA or 12-5_1_TV2_C.ZA. The various regions of the HTV 
genome are shown in Table A, with numbering relative to 8_5_TV1_C.ZA (SEQ ID NO:33). 
ThUs, the term "Pol" refers to one or more of the following polypeptides: polymerase (p6Pol); 

30 protease (prof); reverse transcriptase (p66RT or RT); RNAseH (pl5RNAseH); and/or 
integrase (p31Int or Int). 
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As used herein, the term "virus-like particle" or Cf VLP" refers to a nonreplicating, viral 
xell, derived from any of several viruses discussed further below. VLPs are 
snerally composed of one or more viral proteins, such as, but not limited to those proteins 
iferred to as capsid, coat, shell, surface and/or envelope proteins, or particle-forming 
olypeptides derived from these proteins. VLPs can form spontaneously upon recombinant 
xpression of the protein in an appropriate expression system. Methods for producing 
articular VLPs are known in the art and discussed more fully below. The presence of VLPs 
allowing recombinant expression of viral proteins can be detected using conventional 
schniques known in the art, such as by electron microscopy, X-ray crystallography, and the 
ike. See, e.g., Baker et al, Biophys. J. (1991) ^0:1445-1456; Hagensee et al., J. Virol 
1994) 68:4503-4505. For example, VLPs can be isolated by density gradient centrifugation 
nd/or identified by characteristic density banding. Alternatively, cryoelectron microscopy 
an be performed on vitrified aqueous samples of the VLP preparation in question, and 
mages recorded under appropriate exposure conditions. 

By "particle-forming polypeptide" derived from a particular viral protein is meant a 
ull-length or near full-length viral protein, as well as a fragment thereof, or a viral protein 
vith internal deletions, which has the ability to form VLPs under conditions that favor VLP 
brmation. Accordingly, the polypeptide may comprise the full-length sequence, fragments, 
runcated and partial sequences, as well as analogs and precursor forms of the reference 
nolecule. The term therefore intends deletions, additions and substitutions to the sequence, 
>o long as the polypeptide retains the ability to form a VLP. Thus, the term includes natural 
variations of the specified polypeptide since variations in coat proteins often occur between 
viral isolates. The term also includes deletions, additions and substitutions that do not 
naturally occur in the reference protein, so long as the protein retains the ability to form a 
VLP. Preferred substitutions are those which are conservative in nature, i.e., those 
substitutions that take place within a family of amino acids that are related in their side 
chains. Specifically, amino acids are generally divided into four families: (1) acidic ~ 
aspartate and glutamate; (2) basic lysine, arginine, histidine; (3) non-polar -- alanine, 
valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged 
polar — glycine, asparagine, glutamine, cystine, serine threonine, tyrosine. Phenylalanine, 
tryptophan, and tyrosine are sometimes classified as aromatic amino acids. 
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An "antigen" refers to a molecule containing one or more epitopes (either linear, 
conformational or both) that will stimulate a host's immune system to make a humoral and/or 
cellular antigen-specific response. The term is used interchangeably with the term 
"immunogen." Normally, a B-cell epitope will include at least about 5 amino acids but can 
5 be as small as 3-4 amino acids. A T-cell epitope, such as a CTL epitope, will include at least 
about 7-9 amino acids, and a helper T-cell epitope at least about 12-20 amino acids. 
Normally, an epitope will include between about 7 and 15 amino acids, such as, 9, 10, 12 or 
15 amino acids. The term "antigen" denotes both subunit antigens, (i.e., antigens which are 
separate and discrete from a whole organism with which the antigen is associated in nature), 

10 as well as, killed, attenuated or inactivated bacteria, viruses, fungi, parasites or other 

microbes. Antibodies such as anti-idiotype antibodies, or fragments thereof, and synthetic 
peptide mimotopes, which can mimic an antigen or antigenic determinant, are also captured 
under the definition of antigen as used herein. Similarly, an oligonucleotide or 
polynucleotide which expresses an antigen or antigenic determinant in vivo, such as in gene 

1 5 therapy and DNA immunization applications, is also included in the definition of antigen 
herein. 

For purposes of the present invention, antigens can be derived from any of several 
known viruses, bacteria, parasites and fungi, as described more fully below. The term also 
intends any of the various tumor antigens. Furthermore, for purposes of the present 

20 invention, an "antigen" refers to a protein which includes modifications, such as deletions, 

additions and substitutions (generally conservative in nature), to the native sequence, so long 
as the protein maintains the ability to elicit an immunological response, as defined herein. 
These modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the antigens. 

25 An "immunological response" to an antigen or composition is the development in a 

subject of a humoral and/or a cellular immune response to an antigen present in the 
composition of interest. For purposes of the present invention, a "humoral immune response" 
refers to an immune response mediated by antibody molecules, while a "cellular immune 
response" is one mediated by T-lymphocytes and/or other white blood cells. One important 

30 aspect of cellular immunity involves an antigen-specific response by cytolytic T-cells 

("CTL"s). CTLs have specificity for peptide antigens that are presented in association with 
proteins encoded by the major histocompatibility complex (MHC) and expressed on the 
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surfaces of cells. CTLs help induce and promote the destruction of intracellular microbes, or 
the lysis of cells infected with such microbes. Another aspect of cellular immunity involves 
an antigen-specific response by helper T-cells. Helper T-cells act to help stimulate the 
function, and focus the activity of, nonspecific effector cells against cells displaying peptide 
antigens in association with MHC molecules on their surface. A "cellular immune response" 
also refers to the production of cytokines, chemokines and other such molecules produced by 
activated T-cells and/or other white blood cells, including those derived from CD4+ and 
CD8+ T-cells. ' 

A composition or vaccine that elicits a cellular immune response may serve to 
sensitize a vertebrate subject by the presentation of antigen in association with MHC 
molecules at the cell surface. The cell-mediated immune response is directed at, or near, cells 
presenting antigen at their surface. In addition, antigen-specific T-lymphocytes can be 
generated to allow for the future protection of an immunized host. 

The ability of a particular antigen to stimulate a cell-mediated immunological 
response may be determined by a number of assays, such as by lymphoproliferation 
(lymphocyte activation) assays, CTL cytotoxic cell assays, or by assaying for T-lymphocytes 
specific for the antigen in a sensitized subject. Such assays are well known in the art. See, 
e.g., Erickson et al, J. Immunol (1993) 151:4189-4199; Doe et al., Eur. J. Immunol (1994) 
24:2369-2376. Recent methods of measuring cell-mediated immune response include 
measurement of intracellular cytokines or cytokine secretion by T-cell populations, or by 
measurement of epitope specific T-cells (e.g., by the tetramer technique)(reviewed by 
McMichael, A.J., and O'Callaghan, C.A., /. Exp. Med 187(9)1367-1371, 1998; Mcheyzer- 
Williams, M.G., et al, Immunol Rev. 150:5-21, 1996; Lalvani, A., et al, J. Exp. Med. 
186:859-865, 1997). 

Thus, an immunological response as used herein may be one which stimulates the 
production of CTLs, and/or the production or activation of helper T- cells. The antigen of 
interest may also elicit an antibody-mediated immune response. Hence, an immunological 
response may include one or more of the following effects: the production of antibodies by B- 
cells; and/or the activation of suppressor T-cells and/or y8 T-cells directed specifically to an 
antigen or antigens present in the composition or vaccine of interest. These responses may 
serve to neutralize infectivity, and/or mediate antibody-complement, or antibody dependent 
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cell cytotoxicity (ADCC) to provide protection to an immunized host. Such responses can be 
determined using standard immunoassays and neutralization assays, well known in the art. 

An "immunogenic composition" is a composition that comprises an antigenic 
molecule where administration of the composition to a subject results in the development in 
the subject of a humoral and/or a cellular immune response to the antigenic molecule of 
interest. The immunogenic composition can be introduced directly into a recipient subject, 
such as by injection, inhalation, oral, intranasal and mucosal (e.g., intra-rectally or intra- 
vaginally) administration. 

By "subunit vaccine" is meant a vaccine composition which includes one or more 
selected antigens but not all antigens, derived from or homologous to, an antigen from a 
pathogen of interest such as from a virus, bacterium, parasite or fungus. Such a composition 
is substantially free of intact pathogen cells or pathogenic particles, or the lysate of such cells 
or particles. Thus, a "subunit vaccine" can be prepared from at least partially purified 
(preferably substantially purified) immunogenic polypeptides from the pathogen, or analogs 
thereof. The method of obtaining an antigen included in the subunit vaccine can thus include 
standard purification techniques, recombinant production, or synthetic production. 

"Substantially purified" general refers to isolation of a substance (compound, 
- polynucleotide, protein, polypeptide, polypeptide composition) such that the substance 
comprises the majority percent of the sample in which it resides. Typically in a sample a 
substantially purified component comprises 50%, preferably 80%-85%, more preferably 90- 
95% of the sample. Techniques for purifying polynucleotides and polypeptides of interest are 
well-known in the art and include, for example, ion-exchange chromatography, affinity 
chromatography and sedimentation according to density. 

A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a 
nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of 
mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory 
sequences (or "control elements"). The boundaries of the coding sequence are determined by 
a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) 
terminus. A coding sequence can include, but is not limited to, cDNA from viral, procaryotic 
or eucaryotic mRNA, genomic DNA sequences from viral or procaryotic DNA, and even 
synthetic DNA sequences. A transcription termination sequence such as a stop codon may be 
located 3' to the coding sequence. 
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Typical "control elements", include, but axe not limited to, transcription promoters, 
transcription enhancer elements, transcription termination signals, polyadenylation sequences 
(located 3* to the translation stop codon), sequences for optimization of initiation of 
translation (located 5 ? to the coding sequence), and translation termination sequences. 

A "polynucleotide coding sequence" or a sequence which "encodes" a selected 
polypeptide, is a nucleic acid molecule which is transcribed (in the case of DNA) and 
translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of 
appropriate regulatory sequences (or "control elements"). The boundaries of the coding 
sequence are determined by a start codon at the 5' (amino) terminus and a translation stop 
codon at the 3' (carboxy) terminus. Exemplary coding sequences are the modified viral 
polypeptide-coding sequences of the present invention. A transcription termination sequence 
may be located 3' to the coding sequence. Typical "control elements", include, but are not 
limited to, transcription regulators, such as promoters, transcription enhancer elements, 
transcription termination signals, and polyadenylation sequences; and translation regulators, 
such as sequences for optimization of initiation of translation, e.g., Shine-Dalgarno (ribosome 
binding site) sequences, Kdzak sequences (i.e., sequences for the optimization of translation, 
located, for example, 5' to the coding sequence), leader sequences, translation initiation 
codon (e.g., ATG), and translation termination sequences. In certain embodiments, one or 
more translation regulation or initiation sequences (e.g., the leader sequence) are derived 
from wild-type translation initiation sequences, i.e. 9 sequences that regulate translation of the 
coding region in their native state. Wild-type leader sequences that have been modified, 
using the methods described herein, also find use in the present invention. Promoters can 
include inducible promoters (where expression of a polynucleotide sequence operably linked 
to the promoter is induced by an analyte, cofactor, regulatory protein, etc.), repressible 
promoters (where expression of a polynucleotide sequence operably linked to the promoter is 
induced by an analyte, cofactor, regulatory protein, etc.), and constitutive promoters. 

A "nucleic acid" molecule can include, but is not limited to, procaryotic sequences, 
eucaryotic mRNA, cDNA from eucaryotic mRNA, genomic DNA sequences from eucaryotic 
(e.g., mammalian) DNA, and even synthetic DNA sequences. The term also captures 
sequences that include any of the known base analogs of DNA and RNA. 

"Operably linked" refers to an arrangement of elements wherein the components so 
described are configured so as to perform their usual function. Thus, a given promoter 
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operably linked to a coding sequence is capable of effecting the expression of the coding 
sequence when the proper enzymes are present. The promoter need not be contiguous with 
the coding sequence, so long as it functions to direct the expression thereof. Thus, for 
example, intervening untranslated yet transcribed sequences can be present between the 
5 promoter sequence and the coding sequence and the promoter sequence can still be 
considered "operably linked" to the coding sequence. 

"Recombinant" as used herein to describe a nucleic acid molecule means a 
polynucleotide of genomic, cDNA, semisynthetic, or synthetic origin which, by virtue of its 
origin or manipulation: (1) is not associated with all or a portion of the polynucleotide with 
10 which it is associated in nature; and/or (2) is linked to a polynucleotide other than that to 
which it is linked in nature. The term "recombinant" as used with respect to a protein or 
polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. 
"Recombinant host cells," "host cells," "cells," "cell lines," "cell cultures," and other such 
terms denoting procaryotic microorganisms or eucaryotic cell lines cultured as unicellular 
1 5 entities, are used interchangeably, and refer to cells which can be, or have been, used as 
recipients for recombinant vectors or other transfer DNA, and include the progeny of the 
original cell which has been transfected. It is understood that the progeny of a single parental 
cell may not necessarily be completely identical in morphology or in genomic or total DNA 
complement to the original parent, due to accidental or deliberate mutation. Progeny of the 
20 parental cell which are sufficiently similar to the parent to be characterized by the relevant 
property, such as the presence of a nucleotide sequence encoding a desired peptide, are 
included in the progeny intended by this definition, and are covered by the above terms. 

Techniques for determining amino acid sequence "similarity" are well known in the 
art. In general, "similarity" means the exact amino acid to amino acid comparison of two or 
25 more polypeptides at the appropriate place, where amino acids are identical or possess similar 
chemical and/or physical properties such as charge or hydrophobicity. A so-termed "percent 
similarity" then can be determined between the compared polypeptide sequences. 
Techniques for determining nucleic acid and amino acid sequence identity also are well 
known in the art and include determining the nucleotide sequence of the mRNA for that gene 
30 (usually via a cDN A intermediate) and determining the amino acid sequence encoded 

thereby, and comparing this to a second amino acid sequence. In general, "identity" refers to 
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an exact nucleotide to nucleotide or amino acid to amino acid correspondence of two 
polynucleotides or polypeptide sequences, respectively. 

Two or more polynucleotide sequences can be compared by determining their 
"percent identity." Two or more amino acid sequences likewise can be compared by 
determining their "percent identity." The percent identity of two sequences, whether nucleic 
acid or peptide sequences, is generally described as the number of exact matches between two 
aligned sequences divided by the length of the shorter sequence and multiplied by 100. An 
approximate alignment for nucleic acid sequences is provided by the local homology 
algorithm of Smith and Waterman, Advances in Applied Mathematics 2:482-489 (1981). 
This algorithm can be extended to use with peptide sequences using the scoring matrix 
developed by Dayhoff, Atlas of Protein Sequences and Structure, M.O. Dayhoff ed., 5 suppl. 
3:353-358, National Biomedical Research Foundation, Washington, D.C., USA, and 
normalized by Gribskov, Nucl. Acids Res. 14(6):6745-6763 (1986): An implementation of 
this algorithm for nucleic acid and peptide sequences is provided by the Genetics Computer 
Group (Madison, WI) in their BestFit utility application. The default parameters for this 
method are described in the Wisconsin Sequence Analysis Package Program Manual, Version 
8 (1995) (available from Genetics Computer Group, Madison, WI). Other equally suitable 
programs for calculating the percent identity or similarity between sequences are generally 
known in the art. 

For example, percent identity of a particular nucleotide sequence to a reference 
sequence can be determined using the homology algorithm of Smith and Waterman with a 
default scoring table and a gap penalty of six nucleotide positions. Another method of 
establishing percent identity in the context of the present invention is to use the MPSRCH 
package of programs copyrighted by the University of Edinburgh, developed by John F. 
Collins and Shane S. Sturrok, and distributed by IntelliGenetics, Inc. (Mountain View, CA). 
From this suite of packages, the Smith-Waterman algorithm can be employed where default 
parameters are used for the scoring table (for example, gap open penalty of 12, gap extension 
penalty of one, and a gap of six). From the data generated, the 'Match" value reflects 
"sequence identity." Other suitable programs for calculating the percent identity or similarity 
between sequences are generally known in the art, such as the alignment program BLAST, 
which can also be used with default parameters. For example, BLASTN and BLASTP can be 
used with the following default parameters: genetic code = standard; filter = none; strand - 
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both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 sequences; sort by = 
HIGH SCORE; Databases = non-redundant, GenBank + EMBL + DDBJ + PDB + GenBank 
CDS translations +. Swiss protein + Spupdate + PIR. Details of these programs can be found 
at the following internet address: htt p://www.ncbi nlm . gov/cgi-bin/BLAST. 
5 One of skill in the art can readily determine the proper search parameters to use for a 

given sequence, exemplary preferred Smith Waterman based parameters are presented above. 
For example, the search parameters may vary based on the size of the sequence in question. 
Thus, for the polynucleotide sequences of the present invention the length of the 
polynucleotide sequence disclosed herein is searched against a selected database and 
10 compared to sequences of essentially the same length to determine percent identity. For 
example, a representative embodiment of the present invention would include an isolated 
polynucleotide having X contiguous nucleotides, wherein (i) the X contiguous nucleotides 
have at least about a selected level of percent identity relative to Y contiguous nucleotides of 
the sequences described herein, and (ii) for search purposes X equals Y, wherein Y is a 
1 5 selected reference polynucleotide of defined length. 

The sequences of the present invention can include fragments of the sequences, for 
example, from about 15 nucleotides up to the number of nucleotides present in the full-length 
sequences described herein (e.g., see the Sequence Listing, Figures, and claims), including all 
integer values falling within the . above-described range. For example, fragments of the 
20 polynucleotide sequences of the present invention may be 30-60 nucleotides, 60-120 

nucleotides, 120-240 nucleotides, 240-480 nucleotides, 480-1000 nucleotides, and all integer 

values therebetween. 

The synthetic expression cassettes (and purified polynucleotides) of the present 
invention include related polynucleotide sequences having about 80% to 100%, greater than 
25 80-85%, preferably greater than 90-92%, more preferably greater than 95%, and most 

preferably greater than 98% up to 100% (including all integer values falling within these 
described ranges) sequence identity to the synthetic expression cassette (and purified 
polynucleotide) sequences disclosed herein (for example, to the claimed sequences or other 
sequences of the present invention) when the sequences of the present invention are used as 
30 the query sequence against, for example, a database of sequences. 

Two nucleic acid fragments are considered to "selectively hybridize" as described 
herein. The degree of sequence identity between two nucleic acid molecules affects the 
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efficiency and strength of hybridization events between such molecules. A partially identical 
nucleic acid sequence will at least partially inhibit a completely identical sequence from 
hybridizing to a target molecule. Inhibition of hybridization of the completely identical 
sequence can be assessed using hybridization assays that are well known in the art (e.g., 
5 Southern blot, Northern blot, solution hybridization, or the like, see Sambrook, et al., supra 
or Ausubel et al., supra). Such assays can be conducted using varying degrees of selectivity, 
for example, using conditions varying from low to high stringency. If conditions of low 
stringency are employed, the absence of non-specific binding can be assessed using a 
secondary probe that lacks even a partial degree of sequence identity (for example, a probe 

10 having less than about 30% sequence identity with the target molecule), such that, in the 

absence of non-specific binding events, the secondary probe will not hybridize to the target. 

When utilizing a hybridization-based detection system, a nucleic acid probe is chosen 
that is complementary to a target nucleic acid sequence, and then by selection of appropriate 
conditions the probe and the target sequence "selectively hybridize," or bind, to each other to 

1 5 form a hybrid molecule. A nucleic acid molecule that is capable of hybridizing selectively to 
a target sequence under "moderately stringent" typically hybridizes under conditions that 
allow detection of a target nucleic acid sequence of at least about 10-14 nucleotides in length 
having at least approximately 70% sequence identity with the sequence of the selected 
nucleic acid probe. Stringent hybridization conditions typically allow detection of target 

20 nucleic acid sequences of at least about 10-14 nucleotides in length having a sequence 

identity of greater than about 90-95% with the sequence of the selected nucleic acid probe. 
Hybridization conditions useful for probe/target hybridization where the probe and target 
have a specific degree of sequence identity, can be determined as is known in the art (see, for 
example, Nucleic Acid Hybridization: A Practical Approach , editors BJD. Hames and S.J. 

25 Higgins, (1985) Oxford; Washington, DC; IRL Press). 

With respect to stringency conditions for hybridization, it is well known in the art that 
numerous equivalent conditions can be employed to establish a particular stringency by 
varying, for example, the following factors: the length and nature of probe and target 
sequences, base composition of the various sequences, concentrations of salts and other 

30 hybridization solution components, the presence or absence of blocking agents in the 
hybridization solutions (e.g., formamide, dextran sulfate, and polyethylene glycol), 
hybridization reaction temperature and time parameters, as well as, varying wash conditions. 



PCT/US01/21241 



fa particular set of hybridization conditions is selected following standard 
art (see, for example, Sambrook, et al., supra or Ausubel et al., supra). 
•olynucleotide is "derived from" second polynucleotide if it has the same or 
e same basepair sequence as a region of the second polynucleotide, its cDNA, 
lereof, or if it displays sequence identity as described above, 
wlypeptide is "derived from" a second polypeptide if it is (i) encoded by a first 
derived from a second polynucleotide, or (ii) displays sequence identity to the 
>tides as described above. 

lly, a viral polypeptide is "derived from" a particular polypeptide of a virus 
ide) if it is (i) encoded by an open reading frame of a polynucleotide of that 
ynucleotide), or (ii) displays sequence identity to polypeptides of that virus as 
e. 

led by" refers to a nucleic acid sequence which codes for a polypeptide 
rein the polypeptide sequence or a portion thereof contains an amino acid 
least 3 to 5 amino acids, more preferably at least 8 to 10 amino acids, and even 
ly at least 15 to 20 amino acids from a polypeptide encoded by the nucleic acid 
o encompassed are polypeptide sequences which are immunologically 
ith a polypeptide encoded by the sequence. Further, polyproteins can be 
/ fusing in-frame two or more polynucleotide sequences encoding polypeptide 
ducts. Further, polycistronic coding sequences may be produced by placing 
olynucleotide sequences encoding polypeptide products adjacent each other, 
t the control of one promoter, wherein each polypeptide coding sequence may 
) include sequences for internal ribosome binding sites, 
ied polynucleotide" refers to a polynucleotide of interest or fragment thereof 
itially free, e.g., contains less than about 50%, preferably less than about 70%, 
ferably less than about 90%, of the protein with which the polynucleotide is 
.dated. Techniques for purifying polynucleotides of interest are well-known in 
;lude, for example, disruption of the cell containing the polynucleotide with a 
;ent and separation of the polynucleotides) and proteins by ion-exchange 
)hy, affinity chromatography and sedimentation according to density, 
aicleic acid immunization" is meant the introduction of a nucleic acid molecule 
: or more selected antigens into a host cell, for the in vivo expression of an 
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antigen, antigens, an epitope, or epitopes. The nucleic acid molecule can be introduced 
directly into a recipient subject, such as by injection/inhalation, oral, intranasal and mucosal 
administration, or the like, or can be introduced ex vivo, into cells which have been removed 
from the host. In the latter case, the transformed cells are reintroduced into the subject where 
5 an immune response can be mounted against the antigen encoded by the nucleic acid 
molecule. 

"Gene transfer" or "gene delivery" refers to methods or systems for reliably inserting 
DNA of interest into a host cell. Such methods can result in transient expression of non- 
integrated transferred DNA, extrachromosomal replication and expression of transferred 

10 replicons (e.g., episomes), or integration of transferred genetic material into the genomic 

DNA of host cells. Gene delivery expression vectors include, but are not limited to, vectors 
derived from alphaviruses, pox viruses and vaccinia viruses. When used for immunization, 
such gene delivery expression vectors may be referred to as vaccines or vaccine vectors. 

C T lymphocytes" or 'T cells" are non-antibody producing lymphocytes that constitute 

15 a part of the cell-mediated arm of the immune system. T cells arise from immature 
lymphocytes that migrate from the bone marrow to the thymus, where they undergo a 
maturation process under the direction of thymic hormones. Here, the mature lymphocytes 
rapidly divide increasing to very large numbers. The maturing T cells become 
immunocompetent based on their ability to recognize and bind a specific antigen. Activation 

20 of immunocompetent T cells is triggered when an antigen binds to the lymphocyte's surface 
receptors. . 

The term "transfection" is used to refer to the uptake of foreign DNA by a cell. A cell 
has been "transfected" when exogenous DNA has been introduced inside the cell membrane. 
A number of transfection techniques are generally known in the art. See, e.g., Graham et al. 

25 (1973) Virology, $2:456, Sambrook et al. (1989) Molecular Clomng, a laboratory manual, 

Cold Spring Harbor Laboratories, New York, Davis et al. (1986) Basic Methods in Molecular 
Biology, Elsevier, andChu et al. (1981) Gene 13:197. Such techniques can be used to 
introduce one or more exogenous DNA moieties into suitable host cells. The term refers to 
both stable and transient uptake of the genetic material, and includes uptake of peptide- or 

30 antibody-linked DNAs. 

A 'Vector" is capable of transferring gene sequences to target cells (e.g., viral vectors, 
non-viral vectors, particulate carriers, and liposomes). Typically, "vector construct," 
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"expression vector," and "gene transfer vector," mean any nucleic acid construct capable of 
directing the expression of a gene of interest and which can transfer gene sequences to target 
cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors. 

Transfer of a "suicide gene" (e.g., a drug-susceptibility gene) to a target cell renders 
the cell sensitive to compounds or compositions that are relatively nontoxic to normal cells. 
Moolten, F.L. (1994) Cancer Gene Ther. 1:279-287. Examples of suicide genes are 
thymidine kinase of herpes simplex virus (HSV-tk), cytochrome P450 (Manome et al. (1996) 
Gene Therapy 3:5 13-520), human deoxycytidine kinase (Manome et al. (1 996) Nature 
Medicine 2(5):567-573) and the bacterial enzyme cytosine deaminase (Dong et al. (1996) 
Human Gene Therapy 7:713-720). Cells which express these genes are rendered sensitive to 
the effects of the relatively nontoxic prodrugs ganciclovir (HSV-tk), cyclophosphamide 
(cytochrome P450 2B1), cytosine arabinoside (human deoxycytidine kinase) or 5- 
fluorocytosine (bacterial cytosine deaminase). Culver et al. (1992) Science 256:1550-1552, 
Huber et al. (1994) Proc. Natl. Acad. Sci. USA 91:8302-8306. 

A "selectable marker" or "reporter marker" refers to a nucleotide sequence included in 
a gene transfer vector that has no therapeutic activity, but rather is included to allow for 
simpler preparation, manufacturing, characterization or testing of the gene transfer vector. 

A "specific binding agent" refers to a member of a specific binding pair of molecules 
wherein one of the molecules specifically binds to the second molecule through chemical 
and/or physical means. One example of a specific binding agent is an antibody directed 
against a selected antigen. 

By "subject" is meant any member of the subphylum chordata, including, without 
limitation, humans and other primates, including non-human primates such as chimpanzees 
and other apes and monkey species; farm animals such as cattle, sheep, pigs, goats and 
horses; domestic mammals such as dogs and cats; laboratory animals including rodents such 
as mice, rats and guinea pigs; birds, including domestic, wild and game birds such as 
chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. The term does not 
denote a particular age. Thus, both adult and newborn individuals are intended to be covered, 
The system described above is intended for use in any of the above vertebrate species, since 
the immune systems of all of these vertebrates operate similarly. 

By "pharmaceutically acceptable" or "pharmacologically acceptable" is meant a 
material which is not biologically or otherwise undesirable, i.e., the material may be 
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administered to an individual in a formulation or composition without causing any 
undesirable biological effects or interacting in a deleterious manner with any of the 
components of the composition in which it is contained. 

By "physiological pH" or a "pH in the physiological range" is meant a pH in the range 
5 of approximately 7.2 to 8.0 inclusive, more typically in the range of approximately 7.2 to 7.6 
inclusive. 

As used herein, "treatment" refers to any of (I) the prevention of infection or 
reinfection, as in a traditional vaccine, (ii) the reduction or elimination of symptoms, and (iii) 
the substantial or complete elimination of the pathogen in question. Treatment may be 

10 effected prophylactically (prior to infection) or therapeutically (following infection). 

By "co-administration" is meant administration of more than one composition or 
molecule. Thus, co-administration includes concurrent administration or sequentially 
administration (in any order), via the same or different routes of administration. Non-limiting 
examples of co-administration regimes include, co-administration of nucleic acid and 

15 polypeptide; co-administration of different nucleic acids {e.g., different expression cassettes 
as described herein and/or different gene delivery vectors); and co-administration of different 
polypeptides {e.g., different HTV polypeptides and/or different adjuvants). The term also 
encompasses multiple administrations of one of the co-administered molecules or 
compositions {e.g., multiple administrations of one or more of the expression cassettes 

20 . described herein followed by one or more administrations of a polypeptide-containing 

composition). In cases where the molecules or compositions are delivered sequentially, the 
time between each administration can be readily determined by one of skill in the art in view 
of the teachings herein. 

"Lentiviral vector", and "recombinant lentiviral vector" refer to a nucleic acid 

25 construct which carries, and within certain embodiments, is capable of directing the 

expression of a nucleic acid molecule of interest. The lentiviral vector include at least one 
transcriptional promoter/enhancer or locus defining element(s), or other elements which 
control gene expression by other means such as alternate splicing, nuclear RNA export, post- 
radiational modification of messenger, or post-transcriptional modification of protein. Such 

30 vector constructs must also include a packaging signal, long terminal repeats (LTRS) or 
portion thereof, and positive and negative strand primer binding sites appropriate to the 
retrovirus used (if these are not already present in the retroviral vector). Optionally, the 
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recombinant lentiviral vector may also include a signal which directs polyadenylation, 
selectable markers such as Neo, TK, hygromycin, phleomycin, histidinol, or DHFR, as well 
as one or more restriction sites and a translation termination sequence. By way of example, 
such vectors typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of 
5 second strand DNA synthesis, and a 3'LTR or a portion thereof 

"Lentiviral vector particle" as utilized within the present invention refers to a 
lentivirus which carries at least one gene of interest. The retrovirus may also contain a 
selectable marker. The recombinant lentivirus is capable of reverse transcribing its genetic 
material (RNA) into DNA and incorporating this genetic material into a host cell's DNA upon 

1 0 infection. Lentiviral vector particles may have a lentiviral envelope, a non-lentiviral 
envelope (e.g., an ampho or VSV-G envelope), or a chimeric envelope. 

"Nucleic acid expression vector" or "Expression cassette" refers to an assembly which 
is capable of directing the expression of a sequence or gene of interest. The nucleic acid 
expression vector includes a promoter which is operably linked to the sequences or gene(s) of 

15 interest: Other control elements may be present as well. Expression cassettes described 
herein may be contained within a plasmid construct. In addition to the components of the 
expression cassette, the plasmid construct may also include a bacterial origin of replication, 
one or more selectable markers, a signal which allows the plasmid construct to exist as 
single^stranded DNA (e.g., a M13 origin of replication), a multiple cloning site, and a 

20 "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication). 

"Packaging cell" refers to a cell which contains those elements necessary for 
production of infectious recombinant retrovirus which are lacking in a recombinant retroviral 
vector. Typically, such packaging cells contain one or more expression cassettes which are 
capable of expressing proteins which encode Gag, pol and env proteins. 

25 "Producer cell" or "vector producing cell" refers to a cell which contains all elements 

necessary for production of recombinant retroviral vector particles. 

2. Modes of Carrying Out the Invention 

Before describing the present invention in detail, it is to be understood that this 
30 invention is not limited to particular formulations or process parameters as such may, of 

course, vary. It is also to be understood that the terminology used herein is for the purpose of 
describing particular embodiments of the invention only, and is not intended to be limiting. 
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Although a number of methods and materials similar or equivalent to those described 
herein can be used in the practice of the present invention, the preferred materials and 
methods are described herein. 

5 2.1. The HIV Genome 

The HTV genome and various polypeptide-encoding regions are shown in Table A. 
The nucleotide positions are given relative to 8__5 JTV1_C.ZA (SEQ ID NO:33, Figure 1 1). 
However, it will be readily apparent to one of ordinary skill in the art in view of the teachings 
of the present disclosure how to determine corresponding regions in other HTV strains or 

10 variants (e.g., isolates HIV Ilib , HIV SF2 , HIV-1 SF1625 HIV-1 SF170 , HIV^v, HIV LAI , HIV MN , HTV- 
"1 CM235 „ HTV-1 US4 , other HIV-1 strains from diverse subtypes(e.g., subtypes, A through G, and 
O), HTV-2 strains and diverse subtypes (e.g., HTV-2 UCI and HIV-2 UC2 ), and simian 
immunodeficiency virus (SIV). (See, e.g., Virology, 3rd Edition (W.K. Joklik ed. 1988); 
Fundamental Virology, 2nd Edition (B.N. Fields and D.M. Knipe, eds. 1991); Virology, 3rd 

15 Edition (Fields, BN, DM Knipe, PM Howley, Editors, 1996, Lippincott-Raven, Philadelphia, 
PA; for a: description of these and other related viruses), using for example, sequence 
comparison programs (e.g., BLAST and others described herein) or identification and 
alignment of structural features (e.g., a program such as the "ALB" program described herein 
that can identify the various regions). 
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Table A: Regions of the HIV Genome relative to 8_5_TV1_C.ZA 



Region 


'nsitioTi in nucleotide seouence 


5'LTR 


l-o3o 


U3 


-4 J / 


R 


458-553 


U5 


554-636 


NFkB II 


340-348 


NFkBI 


354-362 


Spl m 


379-388 


Spi n 


390-398 


Spl I 


400-410 


TATA Box 


429-433 


TAR 


A t~~f A A {~\(~\ 

474-499 


Poly A signal 


529-534 


- 

PBS 


63o-o55 


p7 binding region, packaging signal 


685-791 


Gag: 


792-2285 


pl7 


792-1178 


p24 


1179-1871 


Cyclophilin A bdg. 


1395-1505 


MHR 


1632-1694 


P 2 


1872-1907 


P 7 


1908-2072 


Framesbift sbp 


2072-2078 


Pi 


2073-2120 


p6Gag 


2121-2285 


Zn-motif I 


1950-1991 


Zn-motifn 


2013-2054 
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Pol: 

p6Pol 
Prot 
P66RT 
5 pl5RNaseH" 
p31Int 

Vif: 

Hydrophilic region 

10 

Vpr: 

Oligomerization 
Amphipathic a-helix 

15 Tat: 

Tat-1 exon 
Tat-2 exon 
N-terminal domain 
Trans-activation domain 

20 Transduction domain 

Rev: 

Rev-1 exon 
Rev-2 exon 
25 High-affinity bdg. site 

Leu-rich effector domain 

Vpu: 

Transmembrane domain 
30 Cytoplasmic domain 



2072-5086 

2072-2245 
2246-2542 
2543-4210 
3857-4210 
4211-5086 

5034-5612 

5292-5315 

5552-5839 

5552-5677 
5597-5653 

5823-6038 and 8417-8509 

5823-6038 
8417-8509 
5823-5885 
5886-5933 ... 

5961- 5993 

5962- 6037 and 8416-8663 

5962-6037 
8416-8663 
8439-8486 
8562-8588 

6060-6326 

6060^6161 
6162-6326 
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Env (evl60): 


6244-8853 




Signal nentide 


6244-6324 






6325-7794 




VI 


6628-6729 


5 


V2 

V 


6727-6859 






7150-7254 




V4 


741 1-7506 

/Til / J>\Jv 




V5 


7663-7674 




CI 


6325-6627 


10 


C2 


6853-7149 


- 


C3 


7255-7410 

I X— ~) ~> 1 ~ X.\J 




C4 


7507-7662 




C5 


7675-7794 




CD4 binding 


7540-7566 


15 


gp41 


7795-8853 




J. LLolUll L/CULLLlC/ 






OliffonieTization domain 


7924-7959 




N-terrnmal hentad reneat 


7921-80^8 




C-terminal hentad reneat 


8173-8280 


20 


Immunodominant recrion 


8023-8076 




Nef: 


8855-9478 




Myristoylation 


8858-8875 




SH3 binding 


9062-9091 


25 


Polypurine tract 


9128-9154 




SH3 binding 


9296-9307 



It will be readily apparent that one of skill in the art can readily align any sequence to 
that shown in Table A to detennine relative locations of any particular HTV gene. For 
30 example, using one of the alignment programs described herein {e.g., BLAST), other HIV 
Type C sequences can be aligned with 8_5_TV1_C.ZA (Table A) and locations of genes 
determined. 

Polypeptide sequences can be similarly aligned. For example, Figure 103 shows the 
alignment of Env polypeptide sequences from various strains, relative to SF-162. As 
35 described in detail in co-owned WO/39303, Env polypeptides (e.g., gpl20, gpl40 and gpl60) 
include a "bridging sheet" comprised of 4 anti-parallel {3-strands (£-2, P-3, p-20 and (5-21) 
that form a P-sheet Extruding from one pair of the P-strands (p-2 and P-3) are two loops, VI 
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and V2. The (3-2 sheet occurs at approximately amino acid residue 113 (Cys) to amino acid 
residue 1 17 (Thr) while p-3 occurs at approximately amino acid residue 192 (Ser) to amino 
acid residue 194 (lie), relative to SF-162 (see, Figure 103). The "V1/V2 region" occurs at 
approximately amino acid positions 120 (Cys) to residue 189 (Cys), relative to SF-162. 
5 Extruding from the second pair of (3-strands ((3-20 and (3-21) is a "small-loop" structure, also 
referred to herein as "the bridging sheet small loop." The locations of both the small loop and 
. bridging sheet small loop can be determined relative to HXB-2 following the teachings herein 
and in WO/39303. Also shown by arrows in Figure 103A-C are approximate sites for 
deletions sequence from the beta sheet region. The "*" denotes N-glycosylation sites that can 
10 be mutated following the teachings of the present specification. 

2.2 Synthetic Expression Cassettes 

2.2.1 Modification of HIV-1-Type C Pol-, Prot-, Rt-, /at-, Gag, Env, Tat, 
Rev, Nef, RnaseH, Vif, Vpr, and Vpu Nucleic Acid Coding Sequences 
15 One aspect of the present invention is the generation of HTV-1 type C coding 

sequences, and related sequences, having improved expression relative to the corresponding 
wild-type sequences. 

2.2.1.1. Modification of Gag Nucleic Acid Coding Sequences 
20 An exemplary embodiment of the present invention is illustrated herein by modifying 

the Gag protein wild-type sequences obtained from the AF1 10965 and AF1 10967 strains of 
HIV-1 , subtype C. (see, for example, Korber et al. (\998)Human Retroviruses and Aids, Los 
Alamos, New Mexico: Los Alamos National Laboratory; 

Novitsky et al. (1999) J. Virol 73(5):4427-4432, for molecular cloning of various subtype C 
25 clones from Botswana). Also illustrated herein is the modification of wild-type sequences 
from novel isolates 8_5JTV1_C.ZA (also called TV001 or TV1) and 12-5_1JTV2_C.ZA 
(also called TV002 or TV2). SEQ ID NO:52 shows the wild-type sequence of Gag from 
8__.5JTV1_C.ZA and SEQ ID NO:54 shows the wild-type sequence of the major homology 
region of Gag (nucleotides 1632-1694 of Table A) of the same strain. SEQ ID NO:100 
30 shows the wild-type sequence of Gag of 12-5_1 JTV2_C.ZA. 
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Gag sequence obtained from other Type C HTV-1 variants may be manipulated in 
similar fashion following the teachings of the present specification. Such other variants 
include, but are not limited to, Gag protein encoding sequences obtained from the isolates of 
HTV-1 Type C, for example as described in Novitsky et al., (1999), supra; Myers et aL, infra; 
5 Virology, 3rd Edition (W.K. Joklik ed. 1988); Fundamental Virology, 2nd Edition (B.N. 
Fields and D.M. Knipe, eds. 1991); Virology, 3rd Edition (Fields, BN, DM Knipe, PM 
Howley, Editors, 1996, Lippincott-Raven, Philadelphia, PA and on the World Wide Web 
(Internet), for example at http://hiv-web.laiil.gov/cd-bin/hivDB3/public/wdb/ssampubhc and 
http ://hiv-web.lanl .gov. 

10 First, the HTV-1 codon usage pattern was modified so that the resulting nucleic acid 

coding sequence was comparable to codon usage found in highly expressed human genes 
(Example 1). The HTV codon usage reflects a high content of the nucleotides A or T of the 
codon-triplet. The effect of the HTV-1 codon usage is a high AT content in the DNA 
sequence that results in a decreased translation ability and instability of the mRNA. In 

15 comparison, highly expressed human codons prefer the nucleotides G or C. The Gag coding 
sequences were modified to be comparable to codon usage found in highly expressed human 
genes. 

Second, there are inhibitory (or instability) elements (INS) located within the coding 
sequences of the Gag coding sequences. The RRE is a secondary RNA structure that 
20 interacts with the HTV encoded Rev-protein to overcome the expression down-regulating 

effects of the INS. To overcome the post-transcriptional activating mechanisms of RRE and 
Rev, the instability elements can be inactivated by introducing multiple point mutations that, 
do not alter the reading frame of the encoded proteins. 

Subtype C Gag-encoding sequences having inactivated RRE sites are shown, for example, in 
25 Figures 1 (SEQ ID NO:3), 2 (SEQ ID NO:4), 5 (SEQ ID NO:20) and 6 (SEQ ID NO:26). 

Similarly, other synthetic polynucleotides derived from other Subtype C strains can be 

modified to inactivate the RRE sites. 

Modification of the Gag polypeptide coding sequences results in improved expression 

relative to the wild-type coding sequences in a number of mammalian cell lines (as well as 
30 other types of cell lines, including, but not limited to, insect cells). Further, expression of the 

sequences results in production of virus-like particles (VLPs) by these cell lines (see below). 
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2.2.1.2 Modification of Env Nucleic Acid Coding Sequences 
Similarly, the present invention also includes synthetic Env-encoding polynucleotides 
and modified Env proteins. Wild-type Env sequences are obtained from the AF1 10968 and 
AF1 10975 strains as well as novel strains 8__5_TV1_C.ZA (SEQ ID NO:33) and 12- *. 
5 5_1 JTV2_C.ZA (SEQ ID NO:45) of HIV-1 , type C. (see, for example, Novitsky et al. 
(1999) J. Virol. 73(5):4427-4432, for molecular cloning of various subtype C clones from 
Botswana). Wild-type Env sequences of 8_5 JTV1_C.ZA are shown, for example, in SEQ ID 
NO:48 (wild-type Env common region, nucleotides 7486-7629 as shown in Table A); and 
SEQ ID NO:50 (wild type gpl60, nucleotides 6244-8853 as shown in Table A). Wild-type 

10 Env gpl60 of 12-5_1 JTV2J1ZA is shown in SEQ ID NO:98. It will be readily apparent 
from the disclosure herein that polynucleotides encoding fragments of Env gpl60 (e.g., 
gpl20, gp41, gpl40) can be readily obtained from the larger, full-length sequences disclosed 
herein. It will also be readily apparent that other modifications can be made, for example 
deletion of regions such as the VI and/or V2 region; mutation of the cleavage site and the like 

15 (see, Example 1). Exemplary sequences of such modification as shown in SEQ ED NO: J 19 
through 127. 

Further, Env sequences obtained from other Type C HIV-1 variants may be 
manipulated in similar fashion following the teachings of the present specification. Such 
other variants include, but are not limited to, Env protein encoding sequences obtained from 
20 the isolates of HTV-1 Type C, described above. 

The codon usage pattern for Env was modified as described above for Gag so that the 
resulting nucleic acid coding sequence was comparable to codon usage found in highly 
expressed human genes. Experiments performed in support of the present invention show 
that the synthetic Env sequences were capable of higher level of protein production relative to 
25 the native Env sequences. 

Modification of the Env polypeptide coding sequences results in improved expression 
relative to the wild-type coding sequences in a number of mammalian cell lines (as well as 
other types of cell lines, including, but not limited to, insect cells). Similar Env polypeptide 
coding sequences can be obtained, modified and tested for improved expression from a 
30 variety of isolates, including those described above for Gag. 

Further modifications of Env include, but are not limited to, generating 
polynucleotides that encode Env polypeptides having mutations and/or deletions therein. For 
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instance, the hypervariable regions, VI and/or V2, can be deleted as described herein. 
Additionally, other modifications, for example to the bridging sheet region and/or to N- 
glycosylation sites within Env can also be performed following the teachings of the present 
specification, (see, Figure 103A-C and WO/39303). Various combinations of these 
5 modifications can be employed to generate synthetic expression cassettes as described herein. 

2.2.1.3 Modification of Sequences Including HTV-1 Pol Nucleic Acid 
Coding Sequences 

The present invention also includes expression cassettes which include synthetic Pol 

10 sequences. As noted above, "Pol" includes, but is not limited to, the protein-encoding regions 
shown in Figure 7, for example polymerase, protease, reverse transcriptase and/or integrase- 
containing sequences. The regions shown in Figure 7 are described, for example, in Wan et 
et al (1996) Biochem. J, 316:569-573; Kohl et al. (1988) PNAS USA 85:4686-4690; 
Krausslich et al. (1988) 1 Virol 62:4393-4397; Coffin, "Retroviridae and their Replication" 

15 in Virology, ppl437-1500 (Raven, New York, 1990); Patel et. al. (1995) Biochemistry . 
34:5351-5363. Thus, the synthetic expression cassettes exemplified herein include one or 
more of these regions and one or more changes to the resulting amino acid sequences. 

Wild type Pol sequences were obtained from the AF1 10975, 8_5_TV1_C.ZA and 12- 
5_1 JTV2_C.ZA strains of HIV-1, type C. (see, for example, Novitsky et al. (1999) J. Virol 

20 73(5):4427-4432, for molecular cloning of various subtype C clones from Botswana). SEQ 
ID NO:34 shows the wild type sequence of AF1 10975 from the p2 through p7 region of Pol 
(see, Figure 7 and Table A). SEQ ID NO:35 shows the wild type sequence of AF1 10975 
from pi through the first 6 amino acids of integrase (see, Figure 7 and Table A). SEQ ID 
NO:63 and SEQ ID NO.104 show wild-type sequences of Pol from 8_5_TV1_C.ZA and 12- 

25 5_1_TV2_C.ZA, respectively (see, also, Table A). 

Sequence obtained from other Type C HIV-1 variants may be manipulated in similar 
fashion following the teachings of the present specification. Such other variants include, but 
are not limited to, Pol protein encoding sequences obtained from the isolates of HIV-1 Type 
C described herein. 

30 The codon usage pattern for Pol was modified as described above for Gag and Env so 

that the resulting nucleic acid coding sequence was comparable to codon usage found in 
highly expressed human genes. 
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Table B shows the nucleotide positions of various regions found in the Pol constructs 
exemplified herein {e.g., SEQ ID NOs: 30-32). 



Table B 



5 



Region 


Position in nucleotide sequence in construct 




PR975C+) 


PR975Y1VT 


JC JLV-' / X 1VX TV iTX 


- 


Seq IdNo:30 


Seq IdNo:31 


SeqIdNo:32 


Sal 1 restriction site 


1-6 


1-6 


1-6 


Kozak start codon 


7-16 


7-16 


7-16 


P 2 


16-54 


16-54 


16-54 


p7 


55-219 


55-219 


55-219 


pl/p6 pol 


220-375 


220-375 


220-375 


Insertion mutation for in frame 


225 


225 


225 


plOProtease 


376-672 


376-672 . 


376-672 


p66RT 


673-2352 


673-2346 


673-2340 


^p51RT 


673-1992 


673-1986 


673-1980 


plSRNaseH 


1993-2352 


1993-2346 


1993-2340 


catalytic center region 
(YMDD) 


1219-1230 


1219-1224 


1219-1224 


primer grip region (WMGY) 


1357-1368 


1351-1362 


1351-1356 


6aa Integrase 


2353-2370 


2347-2364 


2341-2358 


YMDD epitope cassette 
(incl. 5'+3'Gly) 


2371-2424 


2365-2418 


2359-2412 


MCS (multiple cloning site) 


2425-2463 


2419-2457 


.241.3-2451 


EcoR 1 restriction site 


2464-2469 


2458-2463 


2452-2457 



25 As shown in.Table B, exemplary constructs were modified in various ways. For 

example, the expression constructs exemplified herein include sequence that encodes the first 
6 amino acids of the integrase polypeptide. This 6 amino acid region is believed to provide a 
cleavage recognition site recognized by HIV protease {see, e.g., McCornack et al. (1997) 
FEES Letts 414:84-88). As noted above, certain constructs exemplified herein include a 

30 multiple cloning site (MCS) for insertion of one or more transgenes, typically at the 3' end of 
the construct. In addition, a cassette encoding a catalytic center epitope derived from the 
catalytic center in RT is typically included 3' of the sequence encoding 6 amino acids of 
integrase. This cassette (SEQ ID NO:36) encodes Ilel78 through Serine 191 of RT (amino 
acids 3 through 16 of SEQ ID NO:37) and was added to keep this well conserved region as a 

35 possible CTL epitope. Further, the constructs contain an insertion mutations (position 225 of 
SEQ ED NOs:30 to 32) to preserve the reading frame, (see, e.g., Park et al. (1991) 1 Virol 
65:5111). 
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In certain embodiments, the catalytic center and/or primer grip region of RT are 
modified. The catalytic center and primer grip regions of RT are described, for example, in 
Patel et al. (1995) Biochem. 34:5351 and Palaniappan et al. (1997) J. Biol Chem. 
272(17):11157. For example, in the construct designated PR975YM (SEQ ID NO:31), wild 
5 type sequence encoding the amino acids YMDD at positions 1 83-1 85 of p66 RT, numbered 
relative to AF1 10975, are replaced with sequence encoding the amino acids "AP". In the 
construct designated PR975YMWM (SEQ ID NO:32), the same mutation in YMDD is made 
and, in addition, the primer grip region (amino acids WMGY, residues 229-232 of p66RT, 
numbered relative to AF1 10975) are replaced with sequence encoding the amino acids "PL" 

10 For the Pol sequence, the changes in codon usage are typically restricted to the 

regions up to the -1 frameshift and starting again at the end of the Gag reading frame; 
however, regions within the frameshift translation region can be modified as well. Finally, 
inhibitory (or instability) elements (INS) located within the coding sequences of the protease 
polypeptide coding sequence can be altered as well. 

1 5 Experiments can be performed in support of the present invention to show that the 

synthetic Pol sequences were capable of higher level of protein production relative to the 
native Pol sequences. Modification of the Pol polypeptide coding sequences results in 
improved expression relative to the wild-type coding sequences in a number of mammalian 
cell lines (as well as other types of cell lines, including, but not limited to, insect ceUs). 

20 Similar Pol polypeptide coding sequences can be obtained, modified and tested for improved 
expression from a variety of isolates, including those described above for Gag and Env. 

2.2.1.4 Modification of Other HTV Sequences 

The present invention also includes expression cassettes which include synthetic HTV 
25 Type C sequences derived HTV genes other than Gag, Env and Pol, including but not limited 
to, regions within Gag, Env, Pol, as well as, vif, vpr, tat, rev, vpu, and nef, for example from 
8_5_TV1_C.ZA (SEQ ID NO:33) or 12-5 J JTV2_C.ZA (SEQ ID NO:45). Sequences 
obtained from other strains can be manipulated in similar fashion following the teachings of 
the present specification. 
30 As noted above, the codon usage pattern is modified as described above for Gag, Env 

and Pol so that the resulting nucleic acid coding sequence is comparable to codon usage 
found in highly expressed human genes. Experiments can be performed in support of the 
present invention to show that these synthetic sequences were capable of higher level of 
protein production relative to the native sequences and that modification of the wild-type 
35 polypeptide coding sequences results in improved expression relative to the wild-type coding 
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sequences in a number of mammalian cell lines (as well as other types of cell lines, including, 
but not limited to, insect cells). Furthermore, the nucleic acid sequence can also be modified 
to introduce mutations into one or more regions of the gene, for instance to render the gene 
product non-functional and/or to eliminate the myristoylation site in Nef. 

Synthetic expression cassettes exemplified herein include SEQ ID NO:49 and SEQ ID 
NO:97 (Env gpl60-encoding sequences, modified based on 8_5_TV1_C.ZA wild type and 
12-5_1_TV2_C.ZA wild-type, respectively); SEQ ID NO:51 and SEQ ID NO:99 (Gag- 
encoding sequences modified based on 8_5_TV1_C.ZA wild type and 12-5_1_TV2 C.ZA 
wild-type, respectively); SEQ ID NO:53 (Gag major homology region, modified based on 
8_5_TV1_CZA wild type); SEQ ID NO:55 and SEQ ID NO:101 (Nef-encoding sequences, 
modified based on 8_5_TV1_C.ZA wild type and 12-5_1_TV2_CZA wild-type, 
respectively); SEQ ID NO:57 and SEQ ID NO:134 (Nef-encoding sequences with a mutation 
at position 125 resulting in a non-functional gene product, modified based on 
8_5_TV1_C.ZA wild type and 12-5_1_TV2_C.ZA, respectively); SEQ ID NO:58 (RNAseH- 
encoding sequences, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:60 
(Integrase-encoding sequences, modified based on 8_5_TV1_C.ZA wild type); SEQ ID 
NO:62 and SEQ ID NO: 103 (Pol-encoding sequences, modified based on 8_5_TV1_C.ZA 
wild type and 12-5_1_TV2_C.ZA wild-type, respectively); SEQ ID NO:64 (Protease- 
encoding sequences, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:66 
(inactivated protease-encoding sequences, modified based on 8_5_TV1_C.ZA wild type); 
SEQ ID NO:68 (inactivated protease and RT mutated sequences, modified based on 
8_5_TV1_C.ZA wild type); SEQ ID NO:70 (protease and reverse-transcriptase-encoding 
sequences, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:72 and SEQ ID 
NO:105 (exon 1 of Rev, modified based on 8_5_TV1_C.ZA wild type and 12- 
5_1_TV2_C.ZA wild-type, respectively); SEQ ID NO:74 and SEQ ID NO:107 (exon 2 of 
Rev, modified based on 8_5_TV1_C.ZA wild type and 12-5_1_TV2_C.ZA wild-type, 
respectively); SEQ ID NO:76 (reverse transcriptase-encoding sequences, modified based on 
8_5_TV1_C.ZA wild type); SEQ ID NO:78 (mutated reverse-transcriptase, modified based 
on 8_5_TV1_C.ZA wild type); SEQ ID NO:80 (exon 1 of Tat including a mutation that 
results in non-functional Tat, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:81 
and SEQ ID NO:109 (exon 1 of Tat, modified based on 8_5_TV1_C.ZA wild type and 12- 
5 J_TV2_C.ZA wild-type, respectively); SEQ ID NO:83 and SEQ ID NO:l 1 1 (exon 2 of 
Tat, modified based on 8_5_TV1_C.ZA wild type and 12-5_1_TV2_C.ZA wild-type, 
respectively); SEQIDNO:85 and SEQ IDNO:113) (Vif-encoding sequences, modified 
based on 8_5_TV1_C.ZA wild type and 12-5 J_TV2_C.ZA wild-type, respectively); SEQ 
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ED NO: 87 and SEQ ID NO:l 15 (Vpr-encoding sequences, modified based on 
8_5_TV1_C.ZA wild type and 12-5_1_TV2_C.ZA wild-type, respectively); SEQ ID NO:89 
and SEQ ID NO:l 17 (Vpu-encoding sequences, modified based on 8_5_TV1_C.ZA wild 
type and 12-5_1_TV2._C.ZA wild-type, respectively); SEQ ID NO:91 (sequences of exons 1 
and 2 of Rev, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:93 (sequences of 
mutated exon 1 of Tat and exon 2 of Tat, where mutation of exon 1 results in non-functional 
Tat, modified based on 8__5_TV1_C.ZA wild type); SEQ ID NO:94 (sequences of exons 1 
and 2 of Tat, modified based on 8_5_TV1_C.ZA wild type); SEQ ID NO:96 and SEQ ID 
NO:135 (Nef-encoding sequences including a mutation to eliminate myristoylation site, 
modified based on 8_5_TV1_C.ZA wild type and 12-5_1_TV2_C.ZA, respectively). 

2.2.1.5 Further Modification of Sequences Including HIV-1 Nucleic Acid 
Coding Sequences 

The Type C HIV polypeptide-encoding expression cassettes described herein may 
also contain one or more further sequences encoding, for example, one or more transgenes. 
Further sequences (e.g., transgenes) useful in the practice of the present invention include, but 
are not limited to, further sequences are those encoding further viral epitopes/antigens 
{including but not limited to, HCV antigens (e.g., El, E2; Houghton, M.., et al, U.S. Patent 
No. 5,714,596, issued February 3, 1998; Houghton, M.., et al., U.S. Patent No. 5,712,088, 
issued January 27, 1998; Houghton, M.., et al., U.S. Patent No. 5,683,864, issued November 
4, 1997; Weiner, A. J., et al., U.S. Patent No. 5,728,520, issued March 17, 1998; Weiner, A. J., 
et al, U.S. Patent No. 5,766,845, issued June 16, 1998; Weiner, A.J, et al., U.S. Patent No. 
5,670,152, issued September 23, 1997), HIV antigens (e.g., derived from tat, rev, ne/and/or 
env); and sequences encoding tumor antigens/epitopes. Further sequences may also be 
derived from non- viral sources, for instance, sequences encoding cytokines such interleukin-2 
(IL-2), stem cell factor (SCF), interleukin 3 (IL-3), interleukin 6 (IL-6), interleukin 12 (IL- 
12), G-CSF, granulocyte macrophage-colony stimulating factor (GM-CSF), interleukin-1 
alpha (IL-1I), interleukin-1 1 (IL-1 1), MIP-1I, tumor necrosis factor (TNF), leukemia 
inhibitory factor (LIF), c-kit ligand, thrombopoietin (TPO) and flt3 ligand, commercially 
available from several vendors such as, for example, Genzyme (Framingham, MA), 
Genentech (South San Francisco, CA), Amgen (Thousand Oaks, CA), R&D Systems and 
Immunex (Seattle, WA). Additional sequences are described below, for example in Section 
.2.3. Also, variations on the orientation of the Gag and other coding sequences, relative to 
each other, are described below. 
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HIV polypeptide coding sequences can be obtained from other Type C HIV isolates, 
see, e.g., Myers et al. Los Alamos Database, Los Alamos National Laboratory, Los Alamos, 
New Mexico (1992); Myers et al, Human Retroviruses and Aids, 1997, Los Alamos, New 
Mexico: Los Alamos National Laboratory. Synthetic expression cassettes can be generated 
5 using such coding sequences as starting material by following the teachings of the present 
specification (e.g., see Example 1). 

Further, the synthetic expression cassettes of the present invention include related 
polypeptide sequences having greater than 85%, preferably greater than 90%, more preferably 
greater than 95%, and most preferably greater than 98% sequence identity to the synthetic 

10 expression cassette sequences disclosed herein (for example, (SEQ ID NOs:30-32; SEQ ID 
NOs: 3, 4, 20, and 21 and SEQ ID NOs:5-17). Various coding regions are indicated in 
Figures3 and 4, for example in Figure 3 (AF 11 0968), nucleotides 1-81 (SEQIDNO:18); 
nucleotides 82-1512 (SEQ ID NO:6) encode a gpl20 polypeptide, nucleotides 1513 to 2547 
(SEQ ID NO:10) encode a gp41 polypeptide, nucleotides 82-2025 (SEQ ID NO:7) encode a 

15 gpl40 polypeptide and nucleotides 82-2547 (SEQ ID NO:8) encode a gpl60 polypeptide. 
Similarly, in Figure 98 (SEQ ID NO: 127, strain 8J2_TV1_C.ZA), nucleotides 1-6 are an 
EcoRl restriction site; nucleotides 7-87 a encode a wild-type (from 8_2JTV1_C.ZA) leader 
signal peptide; nucleotides 88 to 1563 encode a gpl20 polypeptide; nucleotides 88 to 2064 
encode a gpl40 polypeptide; nucleotides 88 to 2607 encode a gpl60 polypeptide. 

20 

2.2.3 Expression of Synthetic Sequences Encoding HIV-1 Subtype C and 

Related Polypeptides 
Synthetic HTV-encoding sequences (expression cassettes) of the present invention can 
be cloned into a number of different expression vectors to evaluate levels of expression and, 
25 in the case of Gag, production of VLPs. The synthetic DNA fragments for HIV polypeptides 
can be cloned into eucaryotic expression vectors, including, a transient expression vector, 
CMV-promoter-based mammalian vectors, and a shuttle vector for use in baculovirus 
expression systems. Corresponding wild-type sequences can also be cloned into the same 
vectors. 

30 These vectors can then be transfected into a several different cell types, including a 

variety of mammalian cell lines (293, RD, COS-7, and CHO, cell lines available, for 
example, from the A.T.C.C.). The cell lines are then cultured under appropriate conditions 
and the levels of any appropriate polypeptide product can be evaluated in supematants. (see, 
Table A and Example 2). For example, p24 can be used to evaluate Gag expression; gpl60, 

35 gpl40 or gpl20 can be used to evaluate Env expression; p6pol can be used to evaluate Pol 
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expression; prot can be used to evaluate protease; pi 5 for RNAseH; p31 for Integrase; and 
other appropriate polypeptides for Vif, Vpr, Tat, Rev, Vpu and Nef. Further, modified 
polypeptides can also be used, for example, other Env polypeptides include, but are not 
limited to, for example, native gpl60, oligomeric gpl40, monomeric gpl20 as well as 
,5 modified and/or synthetic sequences of these polypeptides. The results of these assays 
demonstrate that expression of synthetic HIV polypeptide-encoding sequences are 
significantly higher than corresponding wild-type sequences. 

Further, Western Blot analysis can be used to show that cells containing the synthetic 
expression cassette produce the expected protein at higher per-cell concentrations than cells 
10 containing the native expression cassette. The HTV proteins can be seen in both cell ly sates 
and supernatants. The levels of production are significantly higher in cell supematants for 
cells transfected with the synthetic expression cassettes of the present invention. 

Fractionation of the supernatants from mammalian cells transfected with the synthetic 
expression cassette can be used to show that the cassettes provide superior production of HIV 
15 proteins and, in the case of Gag, VLPs, relative to the wild-type sequences. 

Efficient expression of these HTV-containing polypeptides in mammalian cell lines 
provides the following benefits: the polypeptides are free of baculovirus contaminants; 
production by established methods approved by the FDA; increased purity; greater yields 
(relative to native coding sequences); and a novel method of producing the Subtype C HIV- 
20 containing polypeptides in CHO cells which is not feasible in the absence of the increased 
expression obtained using the constructs of the present invention." Exemplary Mammalian 
cell lines include, but are not limited to, BHK, VERO, HT1080, 293, 293T, RD, COS-7, 
CHO, Jurkat, HUT, SUPT, C8166, MOLT4/clone8, MT-2, MT-4, H9, PM1, CEM, and 
CEMX174, such cell lines are available, for example, from the A.T.C.C.). 
25 A synthetic Gag expression cassette of the present invention will also exhibit high 

levels of expression and VLP production when transfected into insect cells. Synthetic 
expression cassettes described herein also demonstrate high levels of expression in insect 
cells. Further, in addition to a higher total protein yield, the final product from the synthetic 
polypeptides consistently contains lower amounts of contaminating baculovirus proteins than 
30 the final product from the native Type C sequences. 

Further, synthetic expression cassettes of the present invention can also be introduced 
into yeast vectors which, in turn, can be transformed into and efficiently expressed by yeast 
cells (Saccharomyces cerevisea; using vectors as described in Rosenberg, S. and 
Tekamp-Olson, P., U.S. Patent No. RE35/749, issued, March 17, 1998). 
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In addition to the mammalian and insect vectors, the synthetic expression cassettes of 
the present invention can be incorporated into a variety of expression vectors using selected 
expression control elements. Appropriate vectors and control elements for any given cell 
type can be selected by one having ordinary skill in the art in view of the teachings of the 
5 present specification and information known in the art about expression vectors. 

For example, a synthetic expression cassette can be inserted into a vector which 
includes control elements operably linked to the desired coding sequence, which allow for the 
expression of the gene in a selected cell-type. For example, typical promoters for mammalian 
cell expression include the SV40 early promoter, a CMV promoter such as the CMV 
1 0 immediate early promoter (a CMV promoter can include intron A), RS V, HTV-Ltr, the mouse 
mammary tumor virus LTR promoter (MMLV-ltr), the adenovirus major late promoter (Ad 
MLP), and the herpes simplex virus promoter, among others. Other nonviral promoters, such 
as a promoter derived from the murine metallothionein gene, will also find use for 
mammalian expression. Typically, transcription termination and polyadenylation sequences 
15 will also be present, located 3* to the translation stop codon. Preferably, a sequence for 
optimization of initiation of translation, located 5' to the coding sequence, is also present. 
Examples of transcription terminator/polyadenylation signals include those derived from 
SV40, as described in Sambrook, et al., supra, as well as a bovine growth hormone 
terminator sequence. Introns, containing splice donor and acceptor sites, may also be 
20 designed into the constructs for use with the present invention (Chapman et al., Nuc, Acids 
Res, (1991) 19:3979-3986). 

Enhancer elements may also be used herein to increase expression levels of the 
m amm alian constructs. Examples include the SV40 early gene enhancer, as described in 
Dijkema et al., EMBO J, (1985) 4:761, the enhancer/promoter derived from the long terminal 
25 repeat (LTR) of the Rous Sarcoma Virus, as described in Gorman et al., Proc, Natl. Acad, 

ScL USA (1982b) 79:6777 and elements derived from human CMV, as described in Boshart 
et al., Cell (1985) 41:521, such as elements included in the CMV intron A sequence 
(Chapman et al., Nuc, Acids Res, (1991) 19:3979-3986). 

The desired synthetic polypeptide encoding sequences can be cloned into any number 
30 of commercially available vectors to generate expression of the polypeptide in an appropriate 
host system. These systems include, but are not limited to, the following: baculovirus 
expression {Reilly, P.R., et al, Baculovirus Expression Vectors: A Laboratory 
Manual (1992); Beames, et al, Biotechniques 11:378 (1991); Pharmingen; Clontech, Palo 
Alto, CA)}, vaccinia expression {Earl, P. L., et al,, "Expression of proteins in mammalian 
35 cells using vaccinia" In Current Protocols in Molecular Biology (F. M. Ausubel, et al, Eds.), 
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Greene Publishing Associates & Wiley Interscience, New York (1991); Moss, B., et aL, U.S. 
Patent Number 5,135,855, issued 4 August 1992}, expression in bacteria {Ausubel, F.M., et 
aL, Current Protocols in Molecular Biology, John Wiley and Sons, Inc., Media PA; 
Clontech} , expression in yeast {Rosenberg, S. and Tekamp-Olson, P., U.S. Patent No. 
RE35,749, issued, March 17, 1998; Shuster, J.R., U.S. Patent No. 5,629,203, issued May 13, 
1997; Gellissen, G., et al.,Antonie Van Leeuwenhoek, 62(l-2):79-93 (1992); Romanos, M.A., 
et aL, Yeast 8(6):423-4S8 (1992); Goeddel, D.V., Methods in Enzymology 185 (1990); 
Guthrie, C., and G.R. Fink, Methods in Enzymology 194 (1991)}, expression in mammalian 
cells {Clontech; Gibco-BRL, Ground Island, NY; e.g., Chinese hamster ovary (CHO) cell 
lines (Haynes, J., et aL, Nuc. Acid Res. 11 :687-706 (1983); 1983, Lau, Y.F., et aL, Mol. Cell. 
Biol. 4:1469-1475 (1984); Kaufinan, R. J., "Selection and coamplification of heterologous 
genes in mammalian cells," in Methods in Enzymology, vol. 185, pp537-566. Academic 
Press, Inc., San Diego CA (1991)}, and expression in plant cells {plant cloning vectors, 
Clontech Laboratories, Inc., Palo Alto, CA, and Pharmacia LKB Biotechnology, Inc., 
Pistcataway, NJ; Hood, E., et aL, J. BacterioL 168:1291-1301 (1986); Nagel, R., et aL, FEMS 
Microbiol. Lett 67:325 (1990); An, et aL, "Binary Vectors", and others in Plant Molecular 
Biology Manual A3:l-19 (1988); Miki, B.L.A., et aL, pp.249-265, and others in Plant DNA 
Infectious Agents (Hohn, T., et aL, eds.) Springer- Verlag, Wien, Austria, (1987); Plant 
Molecular Biology: Essential Techniques, P.G. Jones and J.M. Sutton, New York, J. Wiley, 
1997; Miglani, Gurbachan Dictionary of Plant Genetics and Molecular Biology, New York, 
Food Products Press, 1 998; Henry, R. J., Practical Applications of Plant Molecular Biology, 
New York, Chapman & Hall, 1 997} . 

Also included in the invention is an expression vector, containing coding sequences 
and expression control elements which allow expression of the coding regions in a suitable 
host. The control elements generally include a promoter, translation initiation codon, and 
translation and transcription termination sequences, and an insertion site for introducing the 
insert into the vector. Translational control elements have been reviewed by M. Kozak (e.g., 
Kozak, M., Mamm. Genome 7(8):563-574, 1996; Kozak, M., Biochimie 76(9):81 5-821, 1994; 
Kozak, M., J Cell Biol 108(2):229-241, 1989; Kozak, M., and Shatkin, A.J., Methods 
Enzymol 60:360-375, 1979). 

Expression in yeast systems has the advantage of commercial production. 
Recombinant protein production by vaccinia and CHO cell line have the advantage of being 
mammalian expression systems. Further, vaccinia virus expression has several advantages 
including the following: (i) its wide host range; (ii) faithful post-transcriptional modification, 
processing, folding, transport, secretion, and assembly of recombinant proteins; (iii) high 
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level expression of relatively soluble recombinant proteins; and (iv) a large capacity to 
accommodate foreign DNA. 

The recombinant^ expressed polypeptides from synthetic HIV polypeptide-encoding 
expression cassettes are typically isolated from lysed cells or culture media. Purification can 
5 be carried out by methods known in the art including salt fractionation, ion exchange 
chromatography, gel filtration, size-exclusion chromatography, size-fractionation, and 
affinity chromatography. Immunoaffinity chromatography can be employed using antibodies 
generated based on, for example, HIV antigens. 

Advantages of expressing the proteins of the present invention using mammalian cells 
10 include, but are not limited to, the following: well-established protocols for scale-up 

production; the ability to produce VLPs; cell lines are suitable to meet good manufacturing 
process (GMP) standards; culture conditions for mammalian cells are known in the art. 

Various forms of the different embodiments of the invention, described herein, may 
be combined. 

15 

2.3 Production of Virus-like Particles and Use of the Constructs of 

the Present Invention to create Packaging cell lines. 
The group-specific antigens (Gag) of human immunodeficiency virus type-1 (HTV-1) 
self-assemble into noninfectious virus-like particles (VLP) that are released from various 
20 eucaryotic cells by budding (reviewed by Freed, E.O., Virology 251 : 1-1 5, 1998). The 
synthetic expression cassettes of the present invention provide efficient means for the 
production of HTV-Gag virus-like particles (VLPs) using a variety of different cell types, 
including, but not limited to, mammalian cells. 

Viral particles can bp used as a matrix for the proper presentation of an antigen 
25 entrapped or associated therewith to the immune system of the host. 

2.3.1 VLP Production using the synthetic expression cassettes of the 

PRESENT INVENTION 

30 Experiments can be performed in support of the present invention to demonstrate that 

the synthetic expression cassettes of the present invention provide superior production of both 
Gag proteins and VLPs, relative to native Gag coding sequences. Further, electron 
microscopic evaluation of VLP production can show that free and budding immature virus 
particles of the expected size are produced by cells containing the synthetic expression 

35 cassettes. 
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Using the synthetic expression cassettes of the present invention, rather than native 
Gag coding sequences, for the production of virus-like particles provide several advantages. 
First, VLPs can be produced in enhanced quantity making isolation and purification of the 
VLPs easier. Second, VLPs can be produced in a variety of cell types using the synthetic 
expression cassettes, in particular, mammalian cell lines can be used for VLP production, for 
example, CHO cells. Production using CHO cells provides (i) VLP formation; (ii) correct 
myristoylation and budding; (iii) absence of non-mamallian cell contaminants (e.g., insect 
viruses and/or cells); and (iv) ease of purification. The synthetic expression cassettes of the 
present invention are also useful for enhanced expression in cell-types other than mammalian 
cell lines. For example, infection of insect cells with baculovirus vectors encoding the 
synthetic expression cassettes results in higher levels of total Gag protein yield and higher 
levels of VLP production (relative to wild-type coding sequences). Further, the final product 
from insect cells infected with the baculovirus-Gag synthetic expression cassettes 
consistently contains lower amounts 

of contaminating insect proteins than the final product when wild-type coding sequences are 
used. - - 

VLPs can spontaneously form when the particle-forming polypeptide of interest is 
recombinantly expressed in an appropriate host cell. Thus, the VLPs produced using the 
synthetic expression cassettes of the present invention are conveniently prepared using 
recombinant techniques. As discussed below, the Gag polypeptide encoding synthetic 
expression cassettes of the present invention can include other polypeptide coding sequences 
of interest (for example, HIV protease, HTV polymerase, . HCV core; Env; synthetic Env; see, 
Example 1). Expression of such synthetic expression cassettes yields VLPs comprising the 
Gag polypeptide, as well as, the polypeptide of interest. 

Once coding sequences for the desired particle-forming polypeptides have been 
isolated or synthesized, they can be cloned into any suitable vector or replicon for expression. 
Numerous cloning vectors are known to those of skill in the art, and the selection of an 
appropriate cloning vector is a matter of choice. See, generally, Sambrook et al, supra. The 
vector is then used to transform an appropriate host cell. Suitable recombinant expression 
systems include, but are not limited to, bacterial, mammalian, baculovirus/insect, vaccinia, 
Semliki Forest virus (SFV), Alphaviruses (such as, Sindbis, Venezuelan Equine Encephalitis 
(VEE)), mammalian, yeast and Xenopus expression systems, well known in the art. 
Particularly preferred expression systems are mammalian cell lines, vaccinia, Sindbis, insect 
and yeast systems. 
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For example, a number of mammalian cell lines are known in the art and include 
immortalized cell lines available from the American Type Culture Collection (A.T.C.C.), 
such as, but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster 
kidney (BHK) cells, monkey kidney cells (COS), as well as others. Similarly, bacterial hosts 
such as E. coli, Bacillus subtilis, and Streptococcus spp., will find use with the present 
expression constructs. Yeast hosts useful in the present invention include inter alia, 
Saccharomyces cerevisiae, Candida albicans, Candida nialtosa, Hansenula polymorpha, 
Kluyveromycesfragilis, Kluyveromyces lactis, Pichia guillerimondii, Pichia pastoris, 
Schizosaccharomyces pombe and Yarrowia lipolytica. Insect cells for use with baculovirus 
expression vectors include, inter alia, Aedes aegypti, Autographa californica, Bombyx mori, 
. Drosophila melanogaster, Spodopterafrugiperda, and Trichoplusia ni. See, e.g., Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987). 

Viral vectors can be used for the production of particles in eucaryotic cells, such as 
those derived from the pox family of viruses, including vaccinia virus and avian poxvirus. 
Additionally, a vaccinia based infection/transfection system, as described in Tomei et al, J. 
Virol. (1993) 67:4017-4026 and Selby et al, /. Gen. Virol. (1993) 74: 1103-1 113, will also 
find use with the present invention. In this system, cells are first infected in vitro with a 
vaccinia virus recombinant that encodes the bacteriophage T7 RNA polymerase. This 
polymerase displays exquisite specificity in that it only transcribes templates bearing T7 
promoters. Following infection, cells are transfected with the DNA of interest, driven by a 
T7 promoter. The polymerase expressed in the cytoplasm from the vaccinia virus 
recombinant transcribes the transfected DNA into RNA which is then translated into protein 
by the host translational machinery. Alternately, T7 can be added as a purified protein or 
enzyme as in the "Progenitor" system (Studier and Moffatt, J. Mol. Biol. (1986) 189:113- 
130). The method provides for high level, transient, cytoplasmic production of large 
quantities of RNA and its translation produces). 7 

Depending on the expression system and host selected, the VLPS are produced by 
growing host cells transformed by an expression vector under conditions whereby the 
particle-forming polypeptide is expressed and VLPs can be formed. The selection of the 
appropriate growth conditions is within the skill of the art. If the VLPs are formed 
intracellularly, the cells are then disrupted, using chemical, physical or mechanical means, 
which lyse the cells yet keep the VLPs substantially intact. Such methods are known to those 
of skill in the art and are described in, e.g., Protein Purification Applications: A Practical 
Approach, (E.L.V. Harris and S. Angal, Eds, 1990). 

5.7 
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The particles are then isolated (or substantially purified) using methods that preserve 
the integrity thereof, such as, by gradient centrifugation, e.g., cesium chloride (CsCl) sucrose 
gradients, pelleting and the like (see, e.g., Kirnbauer et al. /. Virol (1993) 67:6929-6936), as 
well as standard purification techniques including, e.g., ion exchange and gel filtration 
chromatography. 

VLPs produced by cells containing the synthetic expression cassettes of the present 
invention can be used to elicit an immune response when administered to a subject. One 
advantage of the present invention is that VLPs can be produced by mammalian cells 
carrying the synthetic expression cassettes at levels previously not possible. As discussed 
above, the VLPs can comprise a variety of antigens in addition to the Gag polypeptide (e.g., 
Gag-protease, Gag-polymerase, Env, synthetic Env, etc.). Purified VLPs, produced using the 
synthetic expression cassettes of the present invention, can be administered to a vertebrate 
subject, usually in the form of vaccine compositions. Combination vaccines may also be 
used, where such vaccines contain, for example, an adjuvant subunit protein (e.g:, Env). 
Administration can take place using the VLPs formulated alone or formulated with other 
antigens. Further, the VLPs can be administered prior to, concurrent with, or subsequent to, 
delivery of the synthetic expression cassettes for DNA immunization (see below) and/or 
delivery of other vaccines. Also, the site of VLP administration may be the same or different 
as other vaccine compositions that are being administered. Gene delivery can be 
accomplished by a number of methods including, but are not limited to, immunization with 
DNA, alphavirus vectors, pox virus vectors, and vaccinia virus vectors. 

VLP immune-stimulating (or vaccine) compositions can include various excipients, 
adjuvants, carriers, auxiliary substances, modulating agents, and the like. The immune 
stimulating compositions will include an amount of the VLP/antigen sufficient to mount an 
immunological response. An appropriate effective amount can be determined by one of skill 
in the art. Such an amount will fall in a relatively broad range that can be determined through 
routine trials and will generally be an amount on the order of about 0.1 ng to about 1000 |ig, 
more preferably about 1 jig to about 300 |ig, of VLP/antigen. 

A carrier is optionally present which is a molecule that does not itself induce the 
production of antibodies harmful to the individual receiving the composition. Suitable 
carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycollic acids, polymeric amino acids, amino acid 
copolymers, lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. 
Examples of particulate carriers include those derived from polymethyl methacrylate 
polymers, as well as microp articles derived from poly(lactides) and poly(lactide-co- 
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glycolides), known as PLG. See, e.g., Jeffery et al., Pharm. Res. (1993) 10:362-368; McGee 
JP, et al, JMicroencapsul 14(2):197-210, 1997; O'Hagan DT, et al., Vaccine ll(2):149-54, 
1993. Such carriers are well known to those of ordinary skill in the art. Additionally, these 
carriers may function as immunostimulating agents ("adjuvants"). Furthermore, the antigen 
5 may be conjugated to a bacterial toxoid, such as toxoid from diphtheria, tetanus, cholera, etc., 
as well as toxins derived from E. coll 

Adjuvants may also be used to enhance the effectiveness of the compositions. Such 
adjuvants include, but are not limited to: (1) aluminum salts (alum), such as aluminum 
hydroxide, aluminum phosphate, aluminum sulfate, etc.; (2) oil~in~water emulsion 

10 formulations (with or without other specific immunostimulating agents such as muramyl 
peptides (see below) or bacterial cell wall components), such as for example (a) MF59 
(International Publication No. WO 90/14837), containing 5% Squalene, 0.5% Tween 80, and 
0.5% Span 85 (optionally containing various amounts of MTP-PE (see below), although not 
required) formulated into submicron particles using a microfluidizer such as Model 1 10Y 

15 microfluidizer (Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% 
Tween 80, 5% pluronic-blocked polymer L12 1 , and thr-MDP (see below) either . 
microfluidized into a submicron emulsion or vortexed to generate a larger particle size 
emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi Immunochem, Hamilton, MT) 
containing 2% Squalene, 0.2% Tween 80, and one or more bacterial cell wall components 

20 from the group consisting of monophosphorylipid A (MPL), trehalose dimycolate (TDM), 
and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin adjuvants, 
such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particle 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete 
Freunds Adjuvant (CFA) and Incomplete Freunds Adjuvant (IFA); (5) cytokines, such as 

25 interleukins (EL-1, IL-2> etc.), macrophage colony stimulating factor (M-CSF), tumor necrosis 
factor (TNF), etc.; (6) oligonucleotides or polymeric molecules encoding immunostimulatory 
CpG mofifs (Davis, H.L., et al., /. Immunology 160:870-876, 1998; Sato, Y. et al., Science 
273:352-354, 1996) or complexes of antigens/oligonucleotides {Polymeric molecules include 
double and single stranded RNA and DNA, and backbone modifications thereof, for example, 

30 methylphosphonate linkages; or (7) detoxified mutants of a bacterial ADP-ribosylating toxin 
such as a cholera toxin (CT), a pertussis toxin (PT), or an E. coli heat-labile toxin (LT), 
particularly LT-K63 (where lysine is substituted for the wild-type amino acid at position 63) 
LT-R72 (where arginine is substituted for the wild-type amino acid at position 72), CT-S109 
(where serine is substituted for the wild-type amino acid at position 109), and PT-K9/G129 

35 (where lysine is substituted for the wild-type amino acid at position 9 and glycine substituted 
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at position 129) (see, e.g., International Publication Nos. W093/13202 and W092/19265); and 
(8) other substances that act as immunostimulating agents to enhance the effectiveness of the 
composition. Further, such polymeric molecules include alternative polymer backbone 
structures such as, but not limited to, polyvinyl backbones (Pitha, Biochern Biophys Acta, 
5 204:39, 1970a; Pitha, Biopolyrners, 9:965, 1970b), and moipholino backbones (Summerton, 
J., et al 9 U.S. Patent No. 5,142,047, issued 08/25/92; Summerton, J., et al, U.S. Patent No. 
5,185,444 issued 02/09/93). A variety of other charged and uncharged polynucleotide 
analogs have been reported. Numerous backbone modifications are known in the art, 
including, but not limited to, uncharged linkages (e.g., methyl phosphonates, 

10 phosphotriesters, phosphoamidates, and carbamates) and charged linkages (e.g., 
phosphorothioates and phosphorodithioates).}; and (7) other substances that act as 
iinmunostimulating agents to enhance the effectiveness of the VLP immune-stimulating (or 
vaccine) composition. Alum, CpG oligonucleotides, and MF59 are preferred. 

Muramyl peptides include, but are not limited to, N-acetyl-muramyl-L-threonyl-D- 

1 5 isoglutamine (thr-MDP), N-acteyl-normuramyl-L-alanyl-D-isogluatme (nor-MDP), N- 
acetylmuramyl-L-alanyl-D-isogluatam 
huydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

Dosage treatment with the VLP composition may be a single dose schedule or a 
multiple dose schedule. A multiple dose schedule is one in which a primary course of 

20 vaccination may be with 1-10 separate doses, followed by other doses given at subsequent 
time intervals, chosen to maintain and/or reinforce the immune response, for example at 1-4 
months for a second dose, and if needed, a subsequent dose(s) after several months. The 
dosage regimen will also, at least in part, be determined by the need of the subject and be 
dependent on the judgment of the practitioner. 

25 If prevention of disease is desired, the antigen carrying VLPs are generally 

administered prior to primary infection with the pathogen of interest. If treatment is desired, 
e.g., the reduction of symptoms or recurrences, the VLP compositions are generally 
administered subsequent to primary infection. 

30 2.3.2 USING THE SYNTHETIC EXPRESSION CASSETTES OF THE PRESENT INVENTION 

TO CREATE PACKAGING CELL LINES 

A number of viral based systems have been developed for use as gene transfer vectors 
for mammalian host cells. For example, retroviruses (in particular, lentiviral vectors) provide 
a convenient platform for gene delivery systems. A coding sequence of interest (for example, 
35 a sequence useful for gene therapy applications) can be inserted into a gene delivery vector 
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and packaged in retroviral particles using techniques known in the art. Recombinant virus 
can then be isolated and delivered to cells of the subject either in vivo or ex vivo. A number 
of retroviral systems have been described, including, for example, the following: (U.S. 
Patent No. 5,219,740; Miller et al. (1989) BioTechniques 7:980; Miller, A.D. (1990) Human 
5 Gene Therapy 1:5; Scarpa et al. (1991) Virology 180:849; Burns et al. (1993) Proc. Natl 
Acad. Set USA 90:8033; Boris-Lawrie et al. (1993) Cur. Opin. Genet Develop. 3:102; GB 
2200651; EP 0415731; EP 0345242; WO 89/02468; WO 89/05349; WO 89/09271; WO 
90/02806; WO 90/07936; WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; WO 
93/11230; WO 93/10218; WO 91/02805; in U.S. 5,219,740; U.S. 4,405,712; U.S. 4,861,719; 
10 U.S. 4,980,289 .and U.S. 4,777,127; in U.S. Serial No. 07/800,921; and in Vile (1993) Cancer 
Res 53:3860-3864; Vile (1993) Cancer Res 53 :962-967; Ram (1993) Cancer Res 53:83-88; 
Takamiya (1992) JNeurosci Res 33:493-503; Baba (1993) JNeurosurg 79:729-735; Mann 
(1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci USA 81;6349; and Miller (1990) Human 
Gene Therapy I. 

15 In other embodiments, gene transfer vectors can be constructed to encode a cytokine 

. or other immunomodulatory molecule. For example, nucleic acid sequences encoding native 
IL-2 and gamma-interferon can be obtained as described in US Patent Nos. 4,738,927 and 
5,326,859, respectively, while useful muteins of these proteins can be obtained as described 
in U.S. Patent No. 4,853,332. Nucleic acid sequences encoding the short and long forms of 

20 mCSF can be obtained as described in US Patent Nos. 4,847,201 and 4,879,227, respectively. 
In particular aspects of the invention, retroviral vectors expressing cytokine or 
immunomodulatory genes can be produced as described herein (for example, employing the 
packaging cell lines of the present invention) and in International Application No. PCT US 
94/0295 1 , entitled " Compositions and Methods for Cancer Immunotherapy." 

25 Examples of suitable immunomodulatory molecules for use herein include the 

following: DL-1 and IL-2 (Karupiah et al. (1990) J. Immunology 144:290-298, Weber et al. 
(1987) J. Exp. Med 166:1716-1733, Gansbacher et al. (1990) /. Exp. Med. 172:1217-1224, 
and U.S. Patent No. 4,738,927); IL-3 and IL-4 (Tepper et al. (1989) Cell 57:503-512, 
Golumbek et al. (1991) Science 254:713-716, and U.S. Patent No. 5,017,691); IL-5 and IL-6 

30 (Brakenhof et al.-(1987) 1 Immunol 139:41 16-4121, and International Publication No. WO 
90/06370); IL-7 (U.S. Patent No. 4,965,195); IL-8, IL-9, DL-10, IL-11, 1L-12, and BL-13 
{Cytoldne Bulletin, Summer 1994); EL-14 and IL-15; alpha interferon (Finter et al. (1991) 
Drugs 42:749-765, U.S. Patent Nos. 4,892,743 and 4,966,843, International Publication No. 
WO 85/02862, Nagata et al. (1980) Nature 284:3 16-320, Familletti et al. (1981) Methods in 

35 Enz. 78:387-394, Twu et al. (1989) Proc. Natl. Acad. Sci. USA 86:2046-2050, and Faktor et 
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al. (1990) Oncogene 5:867-872); beta-interferon (Seif et al. (1991) Virol. 65:664-671); 
gamma-interferons (Radford et al. (1991) The American Society ofHepatology 20082015, 
Watanabe et al. (1989) Proc. Natl. Acad. Sci. USA 86:9456-9460, Gansbacher et al (1990) 
Cancer Research 50:7820-7825, Maio et al. (1989) Can. Immunol Immunother. 30:34-42, 

5 and U.S. Patent Nos. 4,762,791 and 4,727,138); G-CSF (U.S. Patent Nos. 4,999,291 and 
4,810,643); GM-CSF (International Publication No. WO 85/04188). 

Immunomodulatory factors may also be. agonists, antagonists, or ligands for these 
molecules. For example, soluble forms of receptors can often behave as antagonists for these 
types of factors, , as can mutated forms of the factors themselves. 

10 Nucleic acid molecules that encode the above-described substances, as well as other 

nucleic acid molecules that are advantageous for use within, the present invention, may be 
readily obtained from a variety of sources, including, for example, depositories such as the 
American Type Culture Collection, or from commercial sources such as British Bio- 
Technology Limited (Cowley, Oxford England). Representative examples include BBG 12 

15 (containing the GM-CSF gene coding for the mature protein of 127 amino acids), BBG 6 

(which contains sequences encoding gamma interferon), A.T.CC. Deposit No. 39656 (which 
contains sequences encoding TNF), A.T.C.C. Deposit No. 20663 (which contains sequences 
encoding alpha-interferon), A.T.C.C. Deposit Nos. 31902, 31902 and 39517 (which contain 
sequences encoding beta-interferon), A.T.C.C Deposit No. 67024 (which contains a 

20 sequence which encodes Interleulcin-lb), A.T.C.C. Deposit Nos. 39405, 39452, 39516, 39626 
and 39673 (which contain sequences encoding Interleukin-2), A.T.C.C. Deposit Nos. 59399, 
59398, and 67326 (which contain sequences encoding Interleukin-3), A.T.C.C. Deposit No. 
57592 (which contains sequences encoding Interleukin-4), A.T.C.C. Deposit Nos. 59394 and 
59395 (which contain sequences encoding Interleukin-5), and A.T.C.C. Deposit No. 67153 

25 (which contains sequences encoding Interleukin-6). 

Plasmids containing cytokine genes or immunomodulatory genes (International 
Publication Nos. WO 94/0295 1 and WO 96/2101 5) can be digested with appropriate 
restriction enzymes, and DNA fragments containing the particular gene of interest can be 
inserted into a gene transfer vector using standard molecular biology techniques. {See, e.g., 

30 Sambrook et al., supra,, or Ausbel et al (eds) Current Protocols in Molecular Biology, 
Greene Publishing and Wiley-Interscience). 

Polynucleotide sequences coding for the above-described molecules can be obtained 
using recombinant methods, such as by screening cDNA and genomic libraries from cells 
expressing the gene, or by deriving the gene from a vector known to include the same. For 

35 example, plasmids which contain sequences that encode altered cellular products may be 
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obtained from a depository such as the A.T.C.C., or from commercial sources. Plasmids 
containing the nucleotide sequences of interest can be digested with appropriate restriction 
enzymes, and DNA fragments containing the nucleotide sequences can be inserted into a gene 
transfer vector using standard molecular biology techniques. 
5 Alternatively, cDNA sequences for use with the present invention may be obtained 

from cells which express or contain the sequences, using standard techniques, such as phenol 
extraction and PCR of cDNA or genomic DNA. See, e.g., Sambrook et al., supra, for a 
description of techniques used to obtain and isolate DNA. Briefly, mRNA from a cell which 
expresses the gene of interest can be reverse transcribed with reverse transcriptase using 
1 0 oligo-dT or random primers. The single stranded cDNA may then be amplified by PCR (see 
U.S. Patent Nos. 4,683,202, 4,683,195 and 4,800,159, see also PCR Technology: Principles 
and Applications for DNA Amplification, Erlich (ed), Stockton Press, -1989)) using 
oligonucleotide primers complementary to sequences on either side of desired sequences. 

The nucleotide sequence of interest can also be produced synthetically, rather than 
15 cloned, using a DNA synthesizer {e.g. , an Applied Biosystems Model 392 DNA Synthesizer, 
available from ABI, Foster City, Cahfornia). The nucleotide sequence can be designed with 
the appropriate codons for the expression product desired. The complete sequence is 
assembled from overlapping oligonucleotides prepared by standard methods and assembled 
into a complete coding sequence. See, e.g., Edge (1981) Nature 292:756; Nambair et al. 
20 (1984) Science 223:1299; Jay etal. (1984) J. Biol. CW 259:6311. 

The synthetic expression cassettes of the present invention can be employed in the 
construction of packaging cell lines for use with retroviral vectors. 

One type of retrovirus, the murine leukemia virus, or "MLV", has been widely 
utilized for gene therapy applications (see generally Mann et al. (Cell 33: J 53, 1993), Cane 
25 and Mulligan (Proc, Nat'l. Acad. Sci. USA 81 :6349, 1984), and Miller et al., Human Gene 
llerapy 1:5-14,1990. 

Lentiviral vectors typically, comprise a 5* lentiviral LTR, a tRNA binding site, a 
packaging signal, a promoter operably linked to one or more genes of interest, an origin of 
second strand DNA synthesis and a 3' lentiviral LTR, wherein the lentiviral vector contains a 
3 0 nuclear transport element. The nuclear transport element may be located either upstream (5') 
or downstream (3') of a coding sequence of interest (for example, a synthetic Gag or Env 
expression cassette of the present invention). Within certain embodiments, the nuclear 
transport element is not RRE. Within one embodiment the packaging signal is an extended 
packaging signal. Within other embodiments the promoter is a tissue specific promoter, or, 
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alternatively, a promoter such as CMV. Within other embodiments, the lentiviral vector 
further comprises an internal ribosome entry site. 

A wide variety of lentiviruses may be utilized within the context of the present 
invention, including for example, lentiviruses selected from the group consisting of HIV, 
HTV-1, HTV-2, FIV and SIV. 

In one embodiment of the present invention synthetic Gag-polymerase expression 
cassettes are provided comprising a promoter and a sequence encoding synthetic Gag- 
polymerase and at least one of vpr, vpu, nef or vif, wherein the promoter is operably linked to 
Gag-polymerase and vpr, vpu, nef or vif. 

Within yet another aspect of the invention, host cells (e.g., packaging cell lines) are 
provided which contain any of the expression cassettes described herein. For example, within 
one aspect packaging cell line are provided comprising an expression cassette that comprises 
a sequence encoding synthetic Gag-polymerase, and a nuclear transport element, wherein the 
promoter is operably linked to the sequence encoding Gag-polymerase. Packaging cell lines 
may further comprise a promoter and a sequence encoding tat, rev, or an envelope, wherein 
the promoter is operably linked to the sequence encoding tat, rev, Env or sequences encoding 
modified Versions of these proteins. The packaging cell line may further comprise a sequence 
encoding any one or more of nef, vif, vpu or vpr (wild-type or synthetic). 

In one embodiment, the expression cassette (carrying, for example, the synthetic Gag- 
polymerase) is stably integrated. The packaging cell line, upon introduction of a lentiviral 
vector, typically produces particles. The promoter regulating expression of the synthetic 
expression cassette may be inducible. Typically, the packaging cell line, upon introduction of 
a lentiviral vector, produces particles that are essentially free of replication competent virus. 

Packaging cell lines are provided comprising an expression cassette which directs the 
expression of a synthetic Gag-polymerase gene or comprising an expression cassette which 
directs the expression of a synthetic Env genes described herein. (See, also, Andre, S., et al., 
Journal of Virology 72(2):1497-1503, 1998; Haas, J., et al., Current Biology 6(3):315-324, 
1996) for a description of other modified Env sequences). A lentiviral vector is introduced 
into the packaging cell line to produce a vector producing cell line. 

As noted above, lentiviral vectors can be designed to carry or express a selected 
gene(s) or sequences of interest. Lentiviral vectors may be readily constructed from a wide 
variety of lentiviruses (see RNA Tumor Viruses, Second Edition, Cold Spring Harbor 
Laboratory, 1985). Representative examples of lentiviruses included HIV, HIV-1, HTV-2, 
FIV arid SIV. Such lentiviruses may either be obtained from patient isolates, or, more 
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preferably, from depositories or collections such as the American Type Culture Collection, or 
isolated from known sources using available techniques. 

Portions of the lentiviral gene delivery vectors (or vehicles) may be derived from 
different viruses. For example, in a given recombinant lentiviral vector, LTRs may be 
5 derived from an HIV, a packaging signal from SIV, and an origin of second strand synthesis 
from HrV-2. Lentiviral vector constructs may comprise a 5' lentiviral LTR, a tRNA binding 
site, a packaging signal, one or more heterologous sequences, an origin of second strand 
DNA synthesis and a 3 ! LTR, wherein said lentiviral vector contains a nuclear transport 
element that is not RRE. 

10 Briefly, Long Terminal Repeats ("LTRs") are subdivided into three elements, 

designated U5, R and U3. These elements contain a variety of signals which are responsible 
for the biological activity of a retrovirus, including for example, promoter and enhancer 
elements which are located within U3 . LTRs may be readily identified in the pro virus 
(integrated DNA form) due to their precise duplication at either end of the genome. As 

15 utilized herein, a 5' LTR should be understood to include a 5 1 promoter element and sufficient 
LTR sequence to allow reverse transcription and integration of the DNA form of the vector. 
The 3 T LTR should be understood to include a polyadenylation signal, and sufficient LTR 
sequence to allow reverse transcription and rntegratipn of the DNA form of the vector. 

The tRNA binding site and origin of second strand DNA synthesis are also important 

20 for a retrovirus to be biologically active, and may be readily identified by one of skill in the 
art. For example, retroviral tRNA binds to a tRNA binding site by Watson-Crick base 
pairing, and is carried with the retrovirus genome into a viral particle. . The tRNA is then 
utilized as a primer for DNA synthesis by reverse transcriptase. The tRNA binding site may 
be readily identified based upon its location just downstream from the 5'LTR. Similarly, the 

25 origin of second strand DNA synthesis is, as its name implies, important for the second strand 
DNA synthesis of a retrovirus. This region, which is also referred to as the poly-purine tract, 
is located just upstream of the 3'LTR. 

In addition to a 5 ! and 3' LTR, tRNA binding site, and origin of second strand DNA 
synthesis, recombinant retroviral vector constructs may also comprise a packaging signal, as 

30 well as one or more genes or coding sequences of interest. In addition, the lentiviral vectors 
have a nuclear transport element which, in preferred embodiments is not RRE. 
Representative examples of suitable nuclear transport elements include the element in Rous 
sarcoma virus (Ogert, et al., J ViroL 70, 3834-3843, 1996), the element in Rous sarcoma virus 
(Liu & Mertz, Genes & Dev., 9, 1766-1789, 1995) and the element in the genome of simian 

35 retrovirus type I (Zolotukhin, et al., J Virol 68, 7944-7952, 1994). Other potential elements 
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include the elements in the histone gene (Kedes, Annu. Rev, Biochem. 48, 837-870, 1970), 
the a-interferon gene (Nagata et al., Nature 287, 401-408, 1980), the p-adrenergic receptor 
gene (Koilka, et'al., Nature 329, 75-79, 1987), and the c-Jun gene (Hattorie, et al., Proc. 
Natl. Acad. Set USA 85, 9148-9152, 1988). 
5 Recombinant lentiviral vector constructs typically lack both Gag-polymerase and Env 

coding sequences. Recombinant lentiviral vector typically contain less than 20, preferably 
15, more preferably 10, and most preferably 8 consecutive nucleotides found in Gag- 
polymerase and Env genes. One advantage of the present invention is that the synthetic Gag- 
polymerase expression cassettes, which can be used to construct packaging cell lines for the 

10 recombinant retroviral vector constructs, have little homology to wild-type Gag-polymerase 
sequences and thus considerably reduce or eliminate fhe possibility of homologous 
recombination between the synthetic and wild-type sequences. 

Lentiviral vectors may also include tissue-specific promoters to drive expression of. 
one or more genes or sequences of interest. 

1 5 Lentiviral vector constructs may be generated such that more than one gene of interest 

is expressed. This may be accomplished through the use of di- or oligo-cistronic cassettes 
(e.g., where the coding regions are separated by 80 nucleotides or less, see generally Levin et 
al., Gene 108:167-174, 1991), or through the use of Internal Ribosome Entry Sites ("IRES"). 
Packaging cell lines suitable for use with the above described recombinant retroviral 

20 vector constructs may be readily prepared given the disclosure provided herein. Briefly, the 
parent cell line from which the packaging cell line is derived can be selected from a variety of 
mammalian cell lines, including for example, 293, RD, COS-7, CHO, BHK, VERO, HT1080, 
and myeloma cells. 

After selection of a suitable host cell for the generation of a packaging cell line, one or 
25 more expression cassettes are introduced into the cell line in order to complement or supply 
in trans components of the vector which have been deleted. 

Representative examples of suitable expression cassettes have been described herein 
and include synthetic Env, synthetic Gag, synthetic Gag-protease, and synthetic Gag- 
polymerase expression cassettes, which comprise a promoter and a sequence encoding, e.g., 
30 Gag-polymerase and at least one of vpr, vpu, nef or vif, wherein the promoter is operably 
linked to Gag-polymerase and vpr, vpu,nef orvif. As described above, the native and/or 
synthetic coding sequences may also be utilized in these expression cassettes. 

Utilizing the above-described expression cassettes, a wide variety of packaging cell 
lines can be generated. For example, within one aspect packaging cell line are provided 
35 comprising an expression cassette that comprises a sequence encoding synthetic Gag- 
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polymerase, and a nuclear transport element, wherein the promoter is operably linked to the 
sequence encoding Gag-polymerase. Within other aspects, packaging cell lines are provided 
comprising a promoter and a sequence encoding tat, rev, Env, or other HIV antigens or 
epitopes derived therefrom, wherein the promoter is operably linked to the sequence encoding 
tat, rev, Env, or the HIV antigen or epitope. Within further embodiments, the packaging cell 
line may comprise a sequence encoding any one or more of nef, vif, vpu or vpr. For example, 
the packaging cell line may contain only nef, vif, vpu, or vpr alone, nef and vif, nef and vpu, 
nef and vpr, vif and vpu, vif and vpr, vpu and vpr, nef vif and vpu, nef vif and vpr, nef vpu 
and vpr, wir vpu and vpr, or, all four of nef, vif, vpu, and vpr. 

In one embodiment, the expression cassette is stably integrated. Within another 
embodiment, the packaging cell line, upon introduction of a lentiviral vector, produces 
particles. Within further embodiments the promoter is inducible. Within certain preferred 
embodiments of the invention, the packaging cell line, upon introduction of a lentiviral 
vector, produces particles that are free of replication competent virus. 

The synthetic cassettes containing modified coding sequences are transfected into a 
selected cell line. Transfected cells are selected that (i) carry, typically, integrated, stable 
copies of the HTV coding sequences, and (ii) are expressing acceptable levels of these 
polypeptides (expression can be evaluated by methods known in the prior art, e.g., see 
Examples 1-4). The ability of the cell line to produce VLPs may also be verified. 

A sequence of interest is constructed into a suitable viral vector as discussed above. 
This defective virus is then transfected into the packaging cell line. The packaging cell line 
provides the viral functions necessary for producing virus-like particles into which the 
defective viral genome, containing the sequence of interest, are packaged. These VLPs are 
then isolated and can be used, for example, in gene delivery or gene therapy. 

Further, such packaging cell lines can also be used to produce VLPs alone, which can, 
' for example, be used as adjuvants for administration with other antigens or in vaccine 
compositions. Also, co-expression of a selected sequence of interest encoding a polypeptide 
(for example, an antigen) in the packaging cell line can also result in the entrapment and/or 
association of the selected polypeptide in/with the VLPs. 

Various forms of the different embodiments of the present invention (e.g., constructs) 
may be combined. 

2.4 DNA Immunization and Gene Delivery 

A variety of HIV polypeptide antigens, particularly Type C HIV antigens, can be used 
in the practice of the present invention. HTV antigens can be included in DNA immunization 
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constructs containing, for example, a synthetic Gag expression cassette fused in-frame to a 
coding sequence for the polypeptide antigen (synthetic or wild-type), where expression of the 
construct results in VLPs presenting the antigen of interest. 

HIV antigens of particular interest to be used in the practice of the present invention 
include tat, rev, nef, vif, vpu, vpr, and other HTV antigens or epitopes derived therefrom. 
These antigens may be synthetic (as described herein) or wild-type. Further, the packaging 
cell line may contain only nef, and HTV-1 (also known as HTLV-IH, LAV, ARV, etc.), 
including, but not limited to, antigens such as gpl20, gp41, gpl60 (both native and 
modified); Gag; and pol from a variety of isolates including, but not limited to, fflV^ 
HTV SF „ HTV-1 SF162 , HTV-W HTV^, HTV LA1 , HTV MN , HIV- W, HTV-1 US4 , other HTV-1 
strains from diverse subtypes(e.g., subtypes, A through G, and O), HIV-2 strains and diverse 
subtypes (e.g., HTV-2 UC1 and IflV-2 UC2 ). See, e.g., Myers, et al., Los Alamos Database, Los 
Alamos National Laboratory, Los Alamos, New Mexico; Myers, et al., Human Retroviruses 
and Aids, 1990, Los Alamos, New Mexico: Los Alamos National Laboratory. 

To evaluate efficacy, DNA immunization using synthetic expression cassettes of the 
present invention can be performed, for instance as described in Example 4. Mice are 
immunized with both the Gag (and/or Env) synthetic expression cassette and the Gag (and/or 
Env) wild type expression cassette. Mouse immunizations with plasmid-DNAs will show 
that the synthetic expression cassettes provide a clear improvement of immunogenicity 
relative to the native expression cassettes. Also, the second boost immunization will induce a 
secondary immune response, for example, after approximately two weeks. Further, the 
results of CTL assays will show increased potency of synthetic Gag (and/or Env) expression 
cassettes for induction of cytotoxic T-lymphocyte (CTL) responses by DNA immunization. 

It is readily apparent that the subject invention can be used to mount an immune 
response to a wide variety of antigens and hence to treat or prevent a HTV infection, 
particularly Type C HTV infection. 

2.4.1 DELIVERY OF THE SYNTHETIC EXPRESSION CASSETTES OF THE PRESENT 
INVENTION 

Polynucleotide sequences coding for the above-described molecules can be obtained 
using recombinant methods, such as by screening cDNA and genomic libraries from cells 
expressing the gene, or by deriving the gene from a vector known to include the same. 
Furthermore, the desired gene can be isolated directly from cells and tissues containing the 
same, using standard techniques, such as phenol extraction and PCR of cDNA or genomic 
DNA See, e.g., Sambrook et al., supra, for a description of techniques used to obtain and 
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isolate DNA. The gene of interest can also be produced synthetically, rather than cloned. 
The nucleotide sequence can be designed with the appropriate codons for the particular amino 
acid sequence desired. In general, one will select preferred codons for the intended host in 
which the sequence will be expressed. The complete sequence is assembled from 
5 overlapping oligonucleotides prepared by standard methods and assembled into a complete 
coding sequence. See, e.g., Edge, Nature (1981) 292:756; Nambair et ah, Science (1984) 
223:1299; Jay et al., J. Biol Chern. (1984) 259:6311; Stemmer, W.P.C., (1995) Gene 164:49- 
53. 

Next, the gene sequence encoding the desired antigen can be inserted into a vector 

1 0 containing a synthetic expression cassette of the present invention. In certain embodiments, 
the antigen is inserted into the synthetic Gag coding sequence such that when the combined 
sequence is expressed it results in the production of VLPs comprising the Gag polypeptide 
and the antigen of interest, e.g., Env (native or modified) or other antigen(s) (native or 
modified) derived from HIV. Insertions can be made within the coding sequence or at either 

1 5 end of the coding sequence (5\ amino terminus of the expressed Gag polypeptide; or 3', 

carboxy terminus of the expressed Gag polypeptide)(Wagner, R., et al., Arch Virol 127:1 17- 
137, 1992; Wagner, K, ef al. Virology 200:162-175, 1994; Wu, X, et al., /. Virol 
69(6):3389-3398, 1995; Wang, C-T., et al., Virology 200:524-534, 1994; Chazal, N, et al., 
Virology 6S(l):l 11-122, 1994; Griffiths, J.C, et al., J. Virol 67(6):3191-3198, 1993; Reicin, 

20 A.S,etal,J. Virol 69(2):642-650, 1995). 

Up to 50% of the coding sequences of p55Gag can be deleted without affecting the 
assembly to virus-like particles and expression efficiency (Borsetti, A., et al, J. Virol 
72(11):9313-9317, 1998; Gamier, L., et al., J Virol 72(6):4667-4677, 1998; Zhang, Y., et al., 
J Virol 72(3):1782-1789, 1998; Wang, C, et al, J Virol 72(10): 7950-7959, 1998). In one 

25 embodiment of the present invention, immunogenicity of the high level expressing synthetic 
Gag expression cassettes can be increased by the insertion of different structural or non- 
structural HTV antigens, multiepitope cassettes, of cytokine sequences into deleted regions of 
Gag sequence. Such deletions may be generated following the teachings of the present 
invention and information available to one of ordinary skill in the art. One possible 

30 advantage of this approach, relative to using full-length sequences fused to heterologous 
polypeptides, can be higher expression/secretion efficiency of the expression product. 

When sequences are added to the amino terminal end of Gag, the polynucletide can 
contain coding sequences at the 5' end that encode a signal for addition of a myristic moiety . 
to the Gag-containing polypeptide (e.g., sequences that encode Met-Gly). 
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The ability of Gag-containing polypeptide constructs to form VLPs can be empirically 
determined following the teachings of the present specification. 

The synthetic expression cassettes can also include control elements operably linked 
to the coding sequence, which allow for the expression of the gene in vivo in the subject 

5 ^ species. For example, typical promoters for mammalian cell expression include the S V40 
early promoter, a CMV promoter such as the CMV immediate early promoter, the mouse 
mammary tumor virus LTR promoter, the adenovirus major late promoter (Ad MLP), and the 
herpes simplex virus promoter, among others. Other nonviral promoters, such as a promoter 
derived from the murine metallothionein gene, will also find use for mammalian expression. 

10 Typically, transcription termination and polyadenylation sequences will also be present, 

located 3 1 to the translation stop codon. Preferably, a sequence for optimization of initiation 
of translation, located 5' to the coding sequence, is also present. Examples of transcription 
terminator/polyadenylation signals include those derived from SV40, as described in 
Sambrook et al., supra, as well as a bovine growth hormone terminator sequence. 

1 5 Enhancer elements may also be used herein to increase expression levels of the 

mammalian constructs. Examples include the SV40 early gene enhancer, as described in 
Dijkema et aL 9 EMBO 1 (1985) 4:761, the enhancer/promoter derived from the long terminal 
repeat (LTR) of the Rous Sarcoma Virus, as described in Gorman et al., Proa Natl Acad 
Set USA (1982b) 79:6777 and elements derived from human CMV, as described in Boshart 

20 et al., Cell (1985) 41:521, such as elements included in the CMV intron A sequence. 

Furthermore, plasmids can be constructed which include a chimeric antigen-coding 
gene sequences, encoding, e.g., multiple antigens/epitopes of interest, for example derived 
from more than one viral isolate. 

Typically the antigen coding sequences precede or follow the synthetic coding 

25 sequence and the chimeric transcription unit will have a single open reading frame encoding . 
both the antigen of interest and the synthetic coding sequences. Alternatively, multi-cistronic 
cassettes (e.g., bi-cistronic cassettes) can be constructed allowing expression of multiple 
antigens from a single mRNA using the EMCV IRES, or the like. 

Once complete, the constructs are used for nucleic acid immunization using standard 

30 gene delivery protocols. Methods for gene delivery are known in the art. See, e.g., U.S. 
Patent Nos. 5,399,346, 5,580,859, 5,589,466. Genes can be delivered either directly to the 
vertebrate subject or, alternatively, delivered ex vivo, to cells derived from the subject and the 
cells reimplanted in the subject. 

A number of viral based systems have been developed for gene transfer into 

35 mammalian cells. For example, retroviruses provide a convenient platform for gene delivery 
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systems. Selected sequences can be inserted into a vector and packaged in retroviral particles 
using techniques known in the art. The recombinant virus can then be isolated and delivered 
to cells of the subject either in vivo or ex vivo. A number of retroviral systems have been 
described (U.S. Patent No. 5,219,740; Miller and Rosman, BioTechniques (1989) 7:980-990; 
5 Miller, A.D., Human Gene Therapy (1990) 1:5-14; Scarpa et aL, Virology (1991) 180:849- 
852; Burns et aL, Proc. Natl. Acad. Set USA (1993) 90:8033-8037; and Boris-Lawrie and 
Temin, Cur. Opin. Genet. Develop. (1993)3:102-109. 

A number of adenovirus vectors have also been described. Unlike retroviruses which 
integrate into the host genome, adenoviruses persist extrachromosomally thus minimizing the 

10 risks associated with insertional mutagenesis (Haj-Ahmad and Graham, J. Virol (1986) 
57:267-274; Bett et aL, J. Virol(l993) 67:591 1-5921; Mittereder et aL, tfttmaw Gene 
Therapy (1994) 5:717-729; Seth et aL, X Virol. (1994) 6&:933-940; Bair et aL, Gene Therapy 
(1994) 1:51-58; Berkner, K.L. BioTechniques (1988) 6:616-629; and Rich et aL, Human 
Gene TJierapy (1993) 4:461-476). 

1 5 Additionally, various adeno-associated virus (AAV) vector systems have been 

developed for gene delivery. AAV vectors can be readily constructed using techniques well 
known in the art. See, e.g., U.S. Patent Nos. 5,173,414 and 5,139,941; International 
Publication Nos. WO 92/01070 (published 23 January 1992) and WO 93/03769 (published 4 
March 1993); Lebkowski et aL, Molec. Cell Biol. (1988) 8:3988-3996; Vincent et aL, 

20 Vaccines 90 (1990) (Cold Spring Harbor Laboratory Press); Carter, B J. Current Opinion in 
Biotechnology (1992) 3:533-539; Muzyczka, N. Current Topics in Microbiol, and Immunol. 
(1992) 158:97-129; Kotin, R.M. Human Gene Therapy (1994) 5:793-801; Shelling and 
Smith, Gene Therapy (1994) 1:165-169; and Zhou et aL, 1 Exp. Med, (1994) 179:1867-1875. 
Another vector system useful for delivering the polynucleotides of the present 

25 invention is the entericaUy administered recombinant poxvirus vaccines described by Small, 
Jr., P.A., et al. (U.S. Patent No. 5,676,950, issued October 14, 1997). 

Additional viral vectors which will find use for delivering the nucleic acid molecules 
encoding the antigens of interest include those derived from the pox family of viruses, 
including vaccinia virus and avian poxvirus. By way of example, vaccinia virus 

30 recombinants expressing the genes can be constructed as follows. The DNA encoding the 
particular synthetic HIV subtype C polypeptide coding sequence is first inserted into an 
appropriate vector so that it is adjacent to a vaccinia promoter and flanking vaccinia DNA 
sequences, such as the sequence encoding thymidine kinase (TK). This vector is then used to 
transfect cells which are simultaneously infected with vaccinia. Homologous recombination 

35 serves to insert the vaccinia promoter plus the gene encoding the coding sequences of interest 
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into the viral genome. The resulting TKrecombinant can be selected by culturing the cells in 
the presence of 5-bromodeoxyuridine and picking viral plaques resistant thereto. 

Alternatively, avipoxviruses, such as the fowlpox and canarypox viruses, can also be 
used to deliver the genes. Recombinant avipox viruses, expressing immunogens from 
mammalian pathogens, are known to confer protective immunity when a(iministered to non- 
avian species. The use of an avipox vector is particularly desirable in human and other 
mammalian species since members of the avipox genus can only productively replicate in 
susceptible avian species and therefore are not infective in mammalian cells. Methods for 
producing recombinant avipoxviruses are known in the art and employ genetic 
recombination, as described above with respect to the production of vaccinia viruses. See, 
e.g., WO 91/12882; WO 89/03429; and WO 92/03545. 

Molecular conjugate vectors, such as the adenovirus chimeric vectors described in 
Michael et al., J. Biol. Chem. (1993) 268:6866-6869 and Wagner et al., Proc. Natl. Acad. Sci. 
USA (1992) 89:6099-6103, can also be used for gene delivery. 

Members of the Alphavirus genus, such as, but not limited to, vectors derived from 
the Sindbis, Semliki Forest, and Venezuelan Equine Encephalitis viruses, will also find use as 
viral vectors for delivering the polynucleotides of the present invention (for example, a 
synthetic Gag-polypeptide encoding expression cassette). For a description of Sindbis-virus 
derived vectors useful for the practice of the instant methods, see, Dubensky et al., J. Virol. 
(1996) 70:508-519; and International Publication Nos. WO 95/07995 and WO 96/17072; as 
well as, Dubensky, Jr., T.W., et al., U.S. Patent No. 5,843,723, issued December 1, 1998, and 
Dubensky, Jr., T.W., U.S. Patent No. 5,789,245, issued August 4, 1998. 

A vaccinia based infection/transfection system can be conveniently used to provide 
for inducible, transient expression of the coding sequences of interest in a host cell. In this 
system, cells are first infected in vitro with a vaccinia virus recombinant that encodes the 
bacteriophage T7 RNA polymerase. This polymerase displays exquisite specificity in that it 
only transcribes templates bearing T7 promoters. Following infection, cells are transfected 
with the polynucleotide of interest, driven by a T7 promoter. The polymerase expressed in 
the cytoplasm from the vaccinia virus recombinant transcribes the transfected DNA into RNA 
which is then translated into protein by the host translational machinery. The method 
provides for high level, transient, cytoplasmic production of large quantities of RNA and its 
translation products. See, e.g., Elroy-Stein and Moss, Proc. Natl. Acad. Sci. USA (1 990) 
87:6743-6747; Fuerst et al., Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126. 

As an alternative approach to infection with vaccinia or avipox virus recombinants, or 
to the delivery of genes using other viral vectors, an amplification system can be used that 
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will lead to high level expression following introduction into host cells. Specifically, a T7 
RNA polymerase promoter preceding the coding region for T7 RNA polymerase can be 
engineered. Translation of RNA derived from this template will generate T7 RNA 
polymerase which in turn will transcribe more template. Concomitantly, there will be a 
5 cDNA whose expression is under the control of the T7 promoter. Thus, some of the T7 RNA 
polymerase generated from translation of the amplification template RNA will lead to 
transcription of the desired gene. Because some T7 RNA polymerase is required to initiate 
the amplification, T7 RNA polymerase can be introduced into cells along with the template(s) 
to prime the transcription reaction. The polymerase can be introduced as a protein or on a 

10 plasmid encoding the RNA polymerase. For a further discussion of T7 systems and their use 
for transforming cells, see, e.g., International Publication No. WO 94/26911; Studier and 
Moffatt, J. Mol. Biol. (1986) 189:1 13-130; Deng and Wolff, Gene (1994) 143:245-249; Gao 
et al, Biochem. Biophys. Res. Commun. (1994) 200:1201-1206; Gao and Huang, Nuc. Acids 
Res. (1993) 21:2867-2872; Chen et al, Nuc Acids Res. (1994) 22:2114-2120; and U.S. Patent 

15 No. 5,135,855. 

Synthetic expression cassettes of interest can also be delivered without a viral vector. 
For example, the synthetic expression cassette can be packaged in liposomes prior to delivery 
to the subject or to cells derived therefrom. Lipid encapsulation is generally accomplished 
using liposomes which are able to stably bind or entrap and retain nucleic acid. The ratio of 

20 condensed DNA to lipid preparation can vary but will generally be around 1 :1 (mg 

DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as carriers for 
delivery of nucleic acids, see, Hug and Sleight, Biochim. Biophys. Acta. (1991) 1097 :1-17: 
Straubinger et al., in Methods of Enzyrnology (1983), Vol. 101, pp. 512-527. 

Liposomal preparations for use in the present invention include cationic (positively 

25 charged), anionic (negatively charged) and neutral preparations, with cationic liposomes 

particularly preferred. Cationic liposomes have been shown to mediate intracellular delivery 
of plasmid DNA (Feigner et al., Proc. Nail Acad. Set USA (1987) 84:7413-7416); mRNA 
(Malone et al., Proc. Natl Acad. Set USA (1989) 86:6077-6081); and purified transcription 
factors (Debs et al., J. Biol. Chem. (1990) 265:10189-10192), in functional form. 

30 Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]- 

N,N,N-triethylammonium (DOTMA) liposomes are available under the trademark Lipofectin, 
from GIBCO BRL, Grand Island, NY. (See, also, Feigner et al., Proc. Natl Acad. Sci. USA 
(1987) 84:7413-7416). Other commercially available lipids include (DDAB/DOPE) and 
DOTAP/DOPE (Boerhinger). Other cationic liposomes can be prepared from readily 

35 available materials using techniques well known in the art. See, e.g., Szoka et al., Proc. Natl 
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Acad. Sci. USA (1978) 75:4194-4198; PCT Publication No. WO 90/1 1092 for a description of 
the synthesis of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylanimonio)propane) liposomes. 

Similarly, anionic and neutral liposomes are readily available, such as, from Avanti 
Polar Lipids (Birmingham, AL), or can be easily prepared using readily available materials. 

5 Such materials include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, 
dioleoylphosphatidyl choline (DOPC), dioleoylphosphatidyl glycerol (DOPG), 
dioieoylphoshatidyl ethanolamine (DOPE), among others. These materials can also be mixed 
with the DOTMA and DOTAP starting materials in appropriate ratios. Methods for making 
liposomes using these materials are well known in the art. 

10 The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar 

vesicles (SUVs), or large unilamellar vesicles (LUVs). The various liposome-nucleic acid 
complexes are prepared using methods known in the art. See, e.g., Straubinger et al., in 
METHODS OF IMMUNOLOGY (1983), Vol. 101, pp. 512-527; Szoka et al, Proc. Natl 
Acad. Sci. USA (1978) 75:4194-4198; Papahadjopoulos et al., Biochim. Biophys. Acta (1975) 

15 394:483; Wilson et al., Cell (1979) 17:77); Deamer and Bangham, Biochim. Biophys. Acta 
(1976) 443:629; Ostro et al., Biochem. Biophys. Res. Commun. (1977) 76:836; Fraley et al., 
Proc. Natl. Acad. Sci. USA (1979) 76:3348); Enoch and Strittmatter, Proc. Natl. Acad. Sci. 
USA(\919) 76:145); Fraley ztdl.J.Biol Chem. (1980) 255:10431; Szoka and 
Papahadjopoulos, Proc. Natl Acad. Set USA (1978) 75:145; and Schaefer-Ridder et al., 

20 Science (19S2) 215:166. 

The DNA and/or protein antigen(s) can also be delivered in cochleate lipid 
compositions similar to those described by Papahadjopoulos et al., Biochem. Biophys. Acta. 
(1975) 394:483-491. See, also, U.S. Patent Nos. 4,663,161 and 4,871,488. 

The synthetic expression cassette of interest may also be encapsulated, adsorbed to, or 

25 associated with, particulate carriers. Such carriers present multiple copies of a selected 

antigen to the immune system and promote trapping and retention of antigens in local lymph 
nodes. The particles can be phagocytosed by macrophages and can enhance antigen 
presentation through cytokine release. Examples of particulate carriers include those derived 
from polymethyl methacrylate polymers, as well as microparticles derived from 

30 poly(lactides) and poly(lactide-co-glycolides), known as PLG. See, e.g., Jeffery et al., 

Pharm. Res. (1993) 10:362-368; McGee JP, et al., J Microencapsul 14(2):1 97-210, 1997; 
O'Hagan DT, et al., Vaccine 11 (2): 149-54, 1993. Suitable microparticles may also be 
manufactured in the presence of charged detergents, such as anionic or cationic detergents, to 
yield microparticles with a surface having a net negative or a net positive charge. For 

35 example, microparticles manufactured with anionic detergents, such as 
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hexadecyltrimethylammonium bromide (CTAB), i.e. CTAB-PLG microparticles, adsorb 
negatively charged macromolecules, such as DNA. (see, e.g., Int'l Application Number 
PCT/US99/17308). 

Furthermore, other particulate systems and polymers can be used for the in vivo or ex 
5 vivo delivery of the gene of interest. For example, polymers such as polylysine, polyarginine, 
polyornithine, spermine, spermidine, as well as conjugates of these molecules, are useful for 
transferring a nucleic acid of interest. Similarly, DEAE dextran-mediated transfection, 
calcium phosphate precipitation or precipitation using other insoluble inorganic salts, such as ~ 
strontium phosphate, aluminum silicates including bentbnite and kaolin, chromic oxide, 
10 magnesium silicate, talc, and the like, will find use with the present methods. See, e.g., 

Feigner, P.L., Advanced Drug Delivery Reviews (1990) 5:163-187, for a review of delivery 
systems useful for gene transfer. Peptoids (Zuckerman, R.N., et al., U.S. Patent No. 
5,831,005, issued November 3, 1998) may also be used for delivery of a construct of the 
present invention. 

1 5 Additionally, biolistic delivery systems employing particulate carriers such as gold 

and tungsten, are especially useful for delivering synthetic expression cassettes of the present 
invention. The particles are coated with the synthetic expression cassette(s) to be delivered 
and accelerated to high velocity, generally under a reduced atmosphere, using a gun powder 
discharge from a "gene gun." For a description of such techniques, and apparatuses useful 

20 therefore, see, e.g., U.S. Patent Nos. 4,945,050; 5,036,006; 5,100,792; 5,179,022; 5,371,015; 
and 5,478,744. Also, needle-less injection systems can be used (Davis, H.L., et al, Vaccine 
12:1503-1509, 1994; Bioject, Inc., Portland, OR). , . 

Recombinant vectors carrying a synthetic expression cassette of the present invention 
are formulated into compositions for delivery to the vertebrate subject. These compositions 

25 may either be prophylactic (to prevent infection) or therapeutic (to treat disease after 

infection). The compositions will comprise a "therapeutically effective amount" of the gene 
of interest such that an amount of the antigen can be produced in viyo so that an immune 
response is generated in the individual to which it is administered. The exact amount 
necessary will vary depending on the subject being treated; the age and general condition- of 

30 the subject to be treated; the capacity of the subject's immune system to synthesize 

antibodies; the degree of protection desired; the severity of the condition being treated; the 
particular antigen selected and its mode of administration, among other factors. An 
appropriate effective amount can be readily determined by one of skill in the art. Thus, a 
"therapeutically effective amount" will fall in a relatively broad range that can be determined 

3 5 through routine trials. 
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The compositions will generally include one or more "phannaceutically acceptable 
excipients or vehicles" such as water, saline, glycerol, polyethyleneglycol, hyaluronic acid, 
ethanol, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles.. Certain facilitators of 
nucleic acid uptake and/or expression can also be-included in the compositions or 
coadministered, such as, but not limited to, bupivacaine, cardiotoxin and sucrose. 

Once formulated, the compositions of the invention can be administered directly to 
the subject (e.g., as described above) or, alternatively, delivered ex vivo, to cells derived from 
the subject, using methods such as those described above. For example, methods for the ex 
vivo delivery and reimplantation of transformed cells into a subject are known in the art and 
can include, e.g., dextran-mediated transfection, calcium phosphate precipitation, polybrene 
mediated transfection, lipofectamine and LT-1 mediated transfection, protoplast fusion, 
electroporation, encapsulation of the polynucleotide(s) (with or without the corresponding 
antigen) in liposomes, and direct microinjection of the DNA into nuclei. 

Direct delivery of synthetic expression cassette compositions in vivo will generally be 
accomplished with or without viral vectors, as described above, by injection using either a 
conventional syringe or a gene gun, such as the Accell® gene delivery system (PowderJect 
Technologies, Inc., Oxford, England). The constructs can be injected either subcutaneously, 
epidermally, intradermally, intramucosally such as nasally, rectally and vaginally, 
intraperitoneally, intravenously, orally or intramuscularly. Delivery of DNA into cells of the 
epidermis is particularly preferred as this mode of administration provides access to skin- 
associated lymphoid cells and provides for a transient presence of DNA in the recipient. 
Other modes of administration include oral and pulmonary administration, suppositories, 
needle-less injection, transcutaneous and transdermal applications. Dosage treatment may be 
a single dose schedule or a multiple dose schedule. Administration of nucleic acids may also 
be combined with administration of peptides or other substances. 

2.4.2 Ex vivo Delivery of the synthetic expression cassettes of the 

PRESENT INVENTION 

In one embodiment, T cells, and related cell types (including but not limited to 
antigen presenting cells, such as, macrophage^, monocytes, lymphoid cells, dendritic cells, B- 
cells, T-cells, stem cells, and progenitor cells thereof), can be used for ex vivo delivery of the 
synthetic expression cassettes of the present invention. T cells can be isolated from 
peripheral blood lymphocytes (PBLs) by a variety of procedures known to those skilled in the 
art. For example, T cell populations can be "enriched" from a population of PBLs through 
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the removal of accessory and B cells. In particular, T cell enrichment can be accomplished 
by the eliinination of non-T cells using anti-MHC class n monoclonal antibodies. Similarly, 
other antibodies can be used to deplete specific populations of non-T cells. For example, 
anti-Ig antibody molecules can be used to deplete B cells and anti-Mad antibody molecules 
can be used to deplete macrophages. 

T cells can be further fractionated into a number of different subpopulations by 
techniques known to those skilled in the art. Two major subpopulations can be isolated based 
on their differential expression of the cell surface markers CD4 and CD8. For example, 
following the enrichment of T cells as described above, GD4 + cells can be enriched using 
antibodies specific for CD4 (see Coligan et al., supra). The antibodies may be coupled to a 
solid support such as magnetic beads. Conversely, CD 8+ cells can be enriched through the 
use of antibodies specific for CD4 (to remove CD4 + cells), or can be isolated by the use of 
CD8 antibodies coupled to a solid support. CD4 lymphocytes from HTV-1 infected patients 
can be expanded ex vivo, before or after transduction as described by Wilson et. al. (1995) J. 
Infect. Pis. 172:88. 

• Following purification of T cells, a variety of methods of genetic modification known 
to those skilled in the art can be performed using non-viral or viral-based gene transfer 
vectors constructed as described herein. For example, one such approach involves 
transduction of the purified T cell population with vector-containing supernatant of cultures 
derived from vector producing cells. A second approach involves co-cultivation of an 
irradiated monolayer of vector-producing cells with the purified T cells. A third approach 
involves a similar co-cultivation approach; however, the purified T cells are pre-stimulated 
with various cytokines and cultured 48 hours prior to the co-cultivation with the irradiated 
vector producing cells. Pre-stimulation prior to such transduction increases effective gene 
transfer (Nolta et al. (1992) Exp. Hematol. 20:1065). Stimulation of these cultures to 
proliferate also provides increased cell populations for re-infusion into the patient. 
Subsequent to co-cultivation, T cells are collected from the vector producing cell monolayer, 
expanded, and frozen in liquid nitrogen. 

Gene transfer vectors, containing one or more synthetic expression cassette of the 
present invention (associated with appropriate control elements for delivery to the isolated T 
cells) can be assembled using known methods. 

Selectable markers can also be used in the construction of gene transfer vectors. For 
example, a marker can be used which imparts to a mammahan cell transduced with the gene 
transfer vector resistance to a cytotoxic agent. The cytotoxic agent can be, but is not limited 
to, neomycin, aminoglycoside, tetracycline, chloramphenicol, sulfonamide, actinomycin, 
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netropsin, distamycin A, anthracycline, or pyrazinamide. For example, neomycin 
phosphotransferase E imparts resistance to the neomycin analogue geneticin (G418). 

The T cells can also be maintained in a medium containing at least one type of growth 
factor prior to being selected. A variety of growth factors are known in the art which sustain 
the growth of a particular cell type. Examples of such growth factors are cytokine mitogens 
such as rEL-2, IL-10, IL-12, and IL-15, which promote growth and activation of lymphocytes. 
Certain types of cells are stimulated by other growth factors such as hormones, including 
human chorionic gonadotropin (hCG) and human growth hormone. The selection of an 
appropriate growth factor for a particular cell population is readily accomplished by one of 
skill. in the art. 

For example, white blood cells such as differentiated progenitor and stem cells are 
stimulated by a variety of growth factors. More particularly, IL-3, IL-4, IL-5, IL-6, IL-9, 
GM-CSF, M-CSF, and G-CSF, produced by activated T H and activated macrophages, 
stimulate myeloid stem cells, which then differentiate into pluripotent stem cells, 
granulocyte-monocyte progenitors, eosinophil progenitors, basophil progenitors, 
megakaryocytes, and erythroid progenitors. Differentiation is modulated by growth factors 
such as GM-CSF, IL-3^ IL-6, IL-1 1, and EPO. 

Pluripotent stem cells then differentiate into lymphoid stem cells, bone marrow 
stromal cells, T cell progenitors, B cell progenitors, thymocytes, T H Cells, T c cells, and B 
cells. This differentiation is modulated by growth factors such as IL-3, IL-4, IL-6, IL-7, GM- 
CSF, M-CSF, G-CSF, IL-2, and EL-5. 

Granulocyte-monocyte progenitors differentiate to monocytes, macrophages, and 
neutrophils. Such differentiation is modulated by the growth factors GM-CSF, M-CSF, and 
IL-8. Eosinophil progenitors differentiate into eosinophils. This process is modulated by 
GM-CSF and IL-5. 

The differentiation of basophil progenitors into mast cells and basophils is modulated 
by GM-CSF, IL-4, and IL-9. Megakaryocytes produce platelets in response to GM-CSF, 
EPO, and IL-6. Erythroid progenitor cells differentiate into red blood cells in response to 
EPO. 

Thus, during activation by the CD3-binding agent, T cells can also be contacted with 
a mitogen, for example a cytokine such as IL-2. In particularly preferred embodiments, the 
IL-2 is added to the population of T cells at a concentration of about 50 to 100 ng/ml. 
Activation with the CD3-binding agent can be carried out for 2 to 4 days. 

Once suitably activated, the T cells are genetically modified by contacting the same 
with a suitable gene transfer vector under conditions that allow for transfection of the vectors 
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into the T cells. Genetic modification is carried out when the cell density of the T cell 
population is between about 0.1 x 10 6 and 5 x 10 6 , preferably between about 0.5 x 10 6 and 2 x 
10 6 . A number of suitable viral and nonviral-based gene transfer vectors have been described 
for use herein. 

After transduction, transduced cells are selected away from non-transduced cells using 
known techniques. For example, if the gene transfer vector used in the transduction includes 
a selectable marker which confers resistance to a cytotoxic agent, the cells can be contacted 
with the appropriate cytotoxic agent, whereby non-transduced cells can be negatively selected 
away from the transduced cells. If the selectable marker is a cell surface marker, the cells can 
be contacted with a binding agent specific for the particular cell surface marker, whereby the 
transduced cells can be positively selected away from the population. The selection step can 
also entail fluorescence-activated cell sorting (FACS) techniques, such as where FACS is 
used to select cells from the population containing a particular surface marker, or the 
selection step can entail the use of magnetically responsive particles as retrievable supports 
for target cell capture and/or background removal. 

More particularly, positive selection of the transduced cells can be performed using a 
FACS cell sorter (e.g. a FACSVantage™ Cell Sorter, Becton Dickinson Immunocytometry 
Systems, San Jose, CA) to sort and collect transduced cells expressing a selectable cell 
surface marker. Following transduction, the cells are stained with fluorescent-labeled 
antibody molecules directed against the particular cell surface marker. The amount of bound 
antibody on each cell can be measured by passing droplets containing the cells through the 
cell sorter. By imparting an electromagnetic charge to droplets containing the stained cells, 
the transduced cells can be separated from other cells. The positively selected cells are then 
harvested in sterile collection vessels. These cell sorting procedures are described in detail, 
for example, in the FACSVantage™ Training Manual, with particular reference to sections 3- 
11 to 3-28 and 10-1 to 10-17. 

Positive selection of the transduced cells can also be performed using magnetic 
separation of cells based on expression or a particular cell surface marker. In such separation 
techniques, cells to be positively selected are first contacted with specific binding agent (e.g., 
an antibody or reagent the interacts specifically with the cell surface marker). The cells are 
then contacted with retrievable particles (e.g., magnetically responsive particles) which are 
coupled with a reagent that binds the specific binding agent (that has bound to the positive 
cells). The cell-binding agent-particle complex can then be physically separated from non- 
labeled cells, for example using a magnetic field. When using magnetically responsive 
particles, the labeled cells can be retained in a container using a magnetic filed while the 
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negative cells are removed. These and similar separation procedures are known to those of 
ordinary skill in the art. 

Expression of the vector in the selected transduced cells can be assessed by a number 
of assays known to those skilled in the art. For example, Western blot or Northern analysis 
can be employed depending on the nature of the inserted nucleotide sequence of interest. 
Once expression has been established and the transformed T cells have been tested for the 
presence of the selected synthetic expression cassette, they are ready for infusion into a 
patient via the peripheral blood stream. 

The invention includes a kit for genetic modification of an ex vivo population of 
primary mammalian cells. The kit typically contains a gene transfer vector coding for at least 
one selectable marker and at least one synthetic expression cassette contained in one or more 
containers, ancillary reagents or hardware, and instructions for use of the kit. 

2.4.3 Further Delivery regimes 

Any of the polynucleotides (e.g., expression cassettes) or polypeptides described 
herein (delivered by any of the methods described above) can also be used in combination 
with other DNA delivery systems and/or protein delivery systems. Non-limiting examples 
include co-administration of these molecules, for example, in prime-boost methods where one 
or more molecules are delivered in a ''priming" step and, subsequently, one or more 
molecules are delivered in a "boosting" step. In certain embodiments, the delivery of one or 
more nucleic acid-containing compositions and is followed by delivery of one or more 
nucleic acid-containing compositions and/or one or more polypeptide-containing 
compositions (e.g., polypeptides comprising HIV antigens). In other embodiments, multiple 
nucleic acid "primes" (of the same or different nucleic acid molecules) can be followed by 
multiple polypeptide "boosts" (of the same or different polypeptides). Other examples 
include multiple nucleic acid administrations and multiple polypeptide administrations. 

In any method involving co-administration, the various compositions can be 
delivered in any order. Thus, in embodiments including delivery of multiple different 
compositions or molecules, the nucleic acids need not be all delivered before the 
polypeptides. For example, the priming step may include delivery of one or more 
polypeptides and the boosting comprises delivery of one or more nucleic acids and/or one 
more polypeptides. Multiple polypeptide administrations can be followed by multiple 
nucleic acid administrations or polypeptide and nucleic acid administrations can be 
performed in any order. In any of the embodiments described herein, the nucleic acid 
molecules can encode all, some or none of the polypeptides. Thus, one or more or the nucleic 
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acid molecules (e.g., expression cassettes) described herein and/or one or more of the 
polypeptides described herein can be co-administered in any order and via any administration 
routes. Therefore, any combination of polynucleotides and/or polypeptides described herein 
can be used to generate elicit an immune reaction. 
Experimental 

Below are examples of specific embodiments for carrying out the present invention. 
The examples are offered for illustrative purposes only, and are not intended to limit the 
scope of the present invention in any way. 

Efforts have been made to ensure accuracy with respect to numbers used (e.g., 
amounts, temperatures, etc.), but some experimental error and deviation should, of course, be 
allowed for. 

Example 1 

Generation of Synthetic Expression Cassettes 
A, Modification of HIV-1 Env. Gag. Pol Nucleic Acid Coding Sequences 

The Pol coding sequences were selected from Type C strain AF 1 1 0975 . The Gag 
coding sequences were selected from the Type C strains API 10965 and AF 110967. The Env 
coding sequences were selected from Type C strains AF 110968 and AF 11 0975. These 
sequences were manipulated to maximize expression of their gene products. 

First, the HIV-1 codon usage pattern was modified so that the resulting nucleic acid 
coding sequence was comparable to codon usage found in highly expressed human genes. 
The HTV codon usage reflects a high content of the nucleotides A or T of the codon-triplet 
The effect of the HIV-1 codon usage is a high AT content in the DNA sequence that results in 
a decreased translation ability and instability of the mRNA. In comparison, highly expressed 
human codons prefer the nucleotides G or C. The coding sequences were modified to be 
comparable to codon usage found in highly expressed human genes. 

Second, there are inhibitory (or instability) elements (INS) located within the coding 
sequences of the Gag and Gag-protease coding sequences (Schneider R, et al, / Virol 
71(7):4892-4903, 1997). RRE is a secondary RNA structure that interacts with the HIV 
encoded Rev-protein to overcome the expression down-regulating effects of the INS. To 
overcome the post-transcriptional activating mechanisms of RRE and Rev, the instability 
elements are inactivated by introducing multiple point mutations that do not alter the reading 
frame of the encoded proteins. Figures 5 and 6 (SEQ ID Nos: 3, 4, 20 and 21) show the 
location of some remaining INS in synthetic sequences derived from strains AF1 10965 and 
AF1 10967. The changes made to these sequences are boxed in the Figures. In Figures 5 and 
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6, the top line depicts a modified sequence of Gag polypeptides from the indicated strains . 
The nucleotide(s) appearing below the line in the boxed region(s) depicts changes made to 
further remove INS. Thus, when the changes indicated in the boxed regions are made, the 
resulting sequences correspond to the sequences depicted in Figures 1 and 2, respectively. 
5 The synthetic coding sequences are assembled by methods known in the art, for 

example by companies such as the Midland Certified Reagent Company (Midland, Texas). 

In one embodiment of the invention, sequences encoding Pol-polypeptides are 
included with the synthetic Gag or Env sequences in order to increase the number of epitopes 
for virus-like particles expressed by the synthetic, modified Gag/Env expression cassette. 
10 Because synthetic HTV-1 Pol expresses the functional enzymes reverse transcriptase (RT) and 
integrase (INT) (in addition to the structural proteins and protease), it may be helpful in some 
instances to inactivate RT and INT functions. Several deletions or mutations in the RT and 
INT coding regions can be made to achieve catalytic nonfunctional enzymes with respect to 
their RT and INT activity. {Jay. A. Levy (Editor) (1995) The Retroviridae, Plenum Press, 
15 New York. ISBN 0-306-45033X. Pages 215-20; Grimison, B. and Laurence, J . (1995), 

Journal Of Acquired Immune Deficiency Syndromes and Human Retrovirology 9(l):58-68; 
Wakefield, J. K.,et al., (1992) Journal Of Virology 66(11):6806-6812; Esnouf, R.,et al., 
(1995) Nature Structural Biology 2(4):303-308; Maignan, S., et al., (1998) Journal Of 
Molecular Biology 282(2):359-368; Katz, R A. and Skalka, A. M. (1994) Annual Review Of 
20 Biochemistry 73 (1994); Jacobo-Molina, A., et al., (1993) Proceedings Of the National 

Academy Of Sciences Of the United States Of America 90(13):6320-6324; Hickman, A. B., et 
al., (1994) Journal Of Biological Chemistry 269(46):29279-29287; Goldgur, Y., et al., (1998) 
Proceedings Of the National Academy Of Sciences Of the United States Of America 
95(16):9150-9154; Goette, M., et al., (1998) Journal Of Biological Chemistry 
25 273(17):10139-10146; Gorton, J. L., et al., (1998) Journal of Virology 72(6):5046-5O55; 

Engelman, A, et al., (1997) Journal Of Virology 71(5):3507-3514; Dyda, F., et al., Science 
266(5193):1981-1986; Davies, J. F., et al., (1991) Science 252(5002): 88-95; Bujacz, G, et 
al., (1996) Febs Letters 398(2-3):175-178; Beard, W. A, et al., (1996) Journal Of Biological 
Chemistry 271(21):12213-12220; Kohlstaedt, L. A., et al., (1992) Science 256(5065):1783- 
30 1790; Krug, M. S. and Berger, S. L. (1991) Biochemistry 30(44):10614-10623; Mazumder, 
A., et al., (1996) ) Molecular Pharmacology 49(4):621-628; Palaniappan, C, et al., (1997) 
Journal Of Biological Chemistry 272(17) :1 11 574 1164; Rodgers, D. W., et al., (1995) 
Proceedings Of the National Academy Of Sciences Of the United States Of America 
92(4): 1222-1226; Sheng, N. and Dennis, D. (1993) Biochemistry 32(18):4938-4942; Spence, 
35 R. A., etd., (1995) Science 267(5200):988-993.} 
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Furthermore selected B- and/or T-cell epitopes can be added to the Pol constructs 
(e.g., y of the truncated INT or within the deletions of the RT- and INT-coding sequence) to 
replace and augment any epitopes deleted by the functional modifications of RT and INT. 
Alternately, selected B- and T-cell epitopes (including CTL epitopes) from RT and INT can 
5 be included in a minimal VLP formed by expression of the synthetic Gag or synthetic Pol 
cassette, described above. (For descriptions of known HIV B- and T-cell epitopes see, HIV 
Molecular Immunology Database CTL Search Interface; Los Alamos Sequence Compendia, 
1987-1997;Internet address: ht1p://hiv-web.lanl.gov/immunology/index.html.) 

The resulting modified coding sequences are presented as a synthetic Env expression 

10 cassette; a synthetic Gag expression cassette; a synthetic Pol expression cassette. A common 
Gag region (Gag-common) extends from nucleotide position 844 to position 903 (SEQ ID 
NO:l), relative to AF1 10965 (or from approximately amino acid residues 282 to 301 of SEQ 
ID NO:17) and from nucleotide position 841 to position 900 (SEQ ID NO:2), relative to 
AF1 10967 (or from approximately amino acid residues 281 to 300 of SEQ ID NO:22). A 

15 common Env region. (Env-common) extends from nucleotide position 1213 to position 1353 
(SEQ ID NO:5) and amino acid positions 405 to 451 of SEQ ID NO:23, relative to 
AF1 10968 and from nucleotide position 1210 to position 1353 (SEQIDNO:ll) and amino 
acid positions 404-451 (SEQ ID NO:24), relative to AF1 10975. 

The synthetic DNA fragments for Pol, Gag and Env are cloned into the following 

20 eucaryotic expression vectors: pCMVKm2, for transient expression assays and DNA 

immunization studies, the pCMVKm2 vector is derived from pCMV6a (Chapman et al., Nuc. 
Acids Res. (1991) 19:3979-3986) and comprises a kanamycin selectable marker, a ColEl 
origin of replication, a CMV promoter enhancer and Intron A, followed by an insertion site 
for the synthetic sequences described below followed by a polyadenylation signal derived 

25 from bovine growth hormone — the pCMVKm2 vector differs from the pCMV-link vector 
only in that a polylinker site is inserted into pCMVKm2 to generate pCMV-link; pESN2dhfr 
and pCMVPLEdhfr, for expression in Chinese Hamster Ovary (CHO) cells; and, pAcC13, a 
shuttle vector for use in the Baculovirus expression system (pAcC13, is derived from 
pAcC12 which is described by Munemitsu S., et al., Mol Cell Biol 10(ll):5977-5982, 1990). 

30 ' - Briefly, construction of pCMVPLEdhfr was as follows. 

To construct a DHFR cassette, the EMCV IRES (internal ribosome entry site) leader 
was PCR-amplified from pCite-4a+ (Novagen, Inc., Milwaukee, WI) and inserted into pET- 
23d (Novagen, Inc., Milwaukee, WT) as znXba-Nco fragment to give pET-EMCV. The dhfr 
gene was PCR-amplified from pESN2dhfr to give a product with a Gly-Gly-Gly-Ser spacer in 

35 place of the translation stop codon and inserted as an Nco-BamHl fragment to give pET-E- 
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DHFR. Next, the attenuated neo gene was PCR amplified from a pSV2Neo (Clontech, Palo 
Alto, CA) derivative and inserted into the unique BamRl site of pET-E-DHFR to give pET- 
E-DHFR/Neo (m2) . Finally the bovine growth hormone terminator from pCDNA3 (Invitrogen, 
Inc., Carlsbad, CA) was inserted downstream of the neo gene to give pET-E- 
5 DHFR/Neo (m2) BGHt. The EMCV -dhfrlneo selectable marker cassette fragment was prepared . 
by cleavage of pET-E-DHFR/Neo (ni2) BGHt. 

The CMV enhancer/promoter plus Intron A was transferred from pCMV6a (Chapman 
et al., Nuc Acids Res. (1991) 19:3979-3986) as aHindm-Sali fragment into pUC19 (New 
England Biolabs, Inc., Beverly, MA). The vector backbone of pUC19 was deleted from the 
10 Ndel to the Sapl sites. The above described DHFR cassette was added to the construct such 
that the EMCV IRES followed the CMV promoter. The vector also contained an amp r gene 
and an SV40 origin of replication. 

B. Defining of the Major Homology Region flVEHDEO ofHIV-1 p55Gag 

15 The Major Homology Region (MHR) of HIV-1 p55 (Gag) is located in the p24-CA 

sequence of Gag. It is a conserved stretch of approximately 20 amino acids. The position in 
the wild type AF1 10965 Gag protein is from 282-301 (SEQ ID NO:25) and spans a region 
from 844-903 (SEQ ED NO:26) for the Gag DNA-sequence. The position in the synthetic 
Gag protein is also from 282-301 (SEQ ID NO:25) and spans a region from 844-903 (SEQ ID 

20 . NO: 1) for the synthetic Gag DNA-sequence. The position in the wild type and synthetic 
AF1 10967 Gag protein is from 281-300 (SEQ ID NO:27) and spans a region from 841-900 
(SEQ ID NO:2) for the modified Gag DNA-sequence. Mutations or deletions in the MHR 
can severely impair particle production (Borsetti, A., et al., J. Virol. 72(11): 93 13-93 17, 1998; 
Mammano, R, et al., J Virol 68(8):4927-4936, 1994). 

25 Percent identity to this sequence can be determined, for example, using the Smith- 

Waterman search algorithm (Time Logic, Incline Village, NV), with the following exemplary 
parameters: weight matrix = nuc4x4hb; gap opening penalty = 20, gap extension penalty = 5. 

C. Definin g of the Common Sequence Region of HIV-1 Env 

30 The common sequence region (CSR) of HTV-1 Env is located in the C4 sequence of 

Env. It is a conserved stretch of approximately 47 amino acids. The position in 
the wild type and synthetic AF1 10968 Env protein is from approximately amino acid residue 
405 to 451 (SEQ ID NO:28) and spans a region from 1213 to 1353 (SEQ ID NO:5) for the 
Env DNA-sequence. The position in the wild type and synthetic AF1 10975 Env protein is 
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from approximately amino acid residue 404 to 451 (SEQ ID NO:29) and spans a region from 
1210 to 1353 (SEQ ID NO:ll) for the Env DNA-sequence. 

Percent identity to this sequence can be determined, for example, using the Smith- 
Waterman search algorithm (Time Logic, Incline Village, NV), with the following exemplary 
parameters: weight matrix = nuc4x4hb; gap opening penalty = 20, gap extension penalty = 5. 

Various forms of the different embodiments of the invention, described herein, may 
be combined. 

D - Exemplary HIV Sequence s Derived from South African HIV Type C Strains 

HIV coding sequences of novel Type C isolates were obtained. Polypeptide-coding 
sequences were manipulated to maximize expression of their gene products. 

As described above, the HIV-1 codon usage pattern was modified so that the resulting 
nucleic acid coding sequence was comparable to codon usage found in highly expressed 
human genes. The HIV codon usage reflects a high content of the nucleotides A or T of the 
codon-triplet The effect of the HIV-1 codon usage is a high AT content in the DNA 
sequence that results in a decreased translation ability and instability of the mRNA. In 
comparison, highly expressed human codons prefer the nucleotides G or C. The coding 
sequences were modified to be comparable to codon usage found in highly expressed human 
genes. 

Shown below in Table C are exemplary wild-type and synthetic sequences derived 
from a novel South African HIV Type C isolate, clone 8_5_TV1 _C.ZA. Table D shows 
exemplary synthetic Env sequences derived from a novel South African HIV Type C isolate, 
clone 8_2_TV1_C.ZA. Table E shows wild-type and synthetic sequences derived from South 
African HIV Type C strain 12-5_1 JTV2_C.ZA. 
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Name 


SEQ 
ID 


Description 


C4_Env_TVl_C_ZA_opt 
snort 


46 


synthetic sequence of short Env "common 
region 


C4_iinV_i V l_C_Z>A_Opt 


Al 
I 


synineuc sequence 01 unv coniinon region 


C4_EnV_l V l_^_AA_Wt 


/IB 


wiici type v l V-/.ZjA iinv sequence 


"r? i /-a TAT"! T A rt-i^f 

Envgp 1 o U_ 1 V 1 _L_Z. AOpt 


AO 


syntnetic linv gpiou i 


Jbnvgplol/_l V I_C_ZAwt 


ou 


wiiQ type o_j_i v isnv gpiou sequence 


GagJTVl_C_ZAopt 


51 


synthetic sequence of Gag j 


Gag_TVl_C_ZAwt 


52 


wild type 8_5_TV1_C.ZA Gag sequence 


Gag_TVl_ZA_MHRopt 


53 


synthetic sequence of Gag major homology 
region 


Gag_TVl_ZA_MHRwt 


54 


wild type 8_5_TV1.__C.ZA Gag major 
nomoiogy region sequence 


Nef_TVl_C_ZAopt 


55 


synthetic sequence of Nef 


Nef_TVl__C_ZAwt 


56 


wild type 8_5_TV1_C.ZA Nef sequence 


NefD125G_TVl_C_ZAopt 


57 


synthetic sequence of Nef, including mutation 
at position 125 resulting in non-functional gene 
proauci 


p 1 jKJN aseJi_ 1 V 1 Aopt 


JO 


synineuc sequence 01 .kin as en vpiJ 01 roij 


pl5KJNaseri_ 1 V l_C_ZAWt 




him/a Q ^ r TT/1 P 7A "D XT A c fa~Pf cpniiPnrA 

wiiu type o D l v i \^.£-it\ jt\iNAse.ri sequence 


p3 lint__ 1 V l_U_AAopt 


<n 
ou 


syninexic sequence or lniegrase yp j i 01 r 01 j 


pi 1 lnt_ 1 V l_U_ZAWt 


01 


wiiu type o_j_i v i_v^.ZjA mxegrase sequence 


rOl_± V 1_U_Z, AOpt 


OA. 


synuieuc sequence or jt 01 


JrOl_l V l__C_Z,Awt 


OJ 


W11Q. type o_j 1 V 1 L/.z^A JrOi sequence 


Prot_TVl^C__ZAopt 


j 64 


synthetic sequence of Prot 


Prot_TVl_C_ZAwt 


65 


wild type 8_5_TV1_C.ZA Prot sequence 


ProtinaJTVl_C_ZAopt 


66 


synthetic sequence of Prot including mutation 
resulting in inactivation of protease 
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Protina TV1 C 7Awt 

X X \J ILL Id X V X V-/ JL^iXX W I 


67 


wild tvne 8 5 TV1 C ZA Prot seauence 
including mutation resulting in inactivation of 
protease. 


Prnfm«T?Trrmt TV1 P 7 An 

p 




oyillllCUO oCqUCxlOO Ux JxUl CLLLU. ICVCioC 

transcriptase (RT), including mutation resulting 
in inactivation of protease and mutation 
resulting in inactivation of RT 


ProtmaRTmut TV1 C ZA 

A X v LXXXCLXV X 1JLLU-L X V X 

Wt 


69 


wild tvoe 8 5 TV1 C ZA Prot and RT 
mutation resulting in inactivation of protease 
and mutation resulting in inactivation of RT. 


ProtwtRTwt_TVl_C_ZAopt 


70 


synthetic sequences of Prot and RT 


ProtwtRTwt_TVl_C_ZAwt 


71 


wild type 8_5_TV1 _C.ZA Prot and RT 


RevExonl_T-Vl_C_ZAopt 


72 


synthetic sequence of exon 1 of Rev 


RevExonl_TVl_C_ZAwt 


73 


wild type 8_5JTV1_C.ZA of exon 1 of Rev 


RevExon2_TVl_C_ZAopt-2 


74 


synthetic sequence of exon 2 of Rev 


RevExon2_TVl_C_ZAwt 


75 


wild type 8_5 JTV1_C.ZA of exon 2 of Rev 


RT TV1 C ZAoot 


76 


svnthetic seauence of RT 


RT_TVl_C_ZAwt 


77 


wild type 8_5JTV1_C.ZA RT 


RTmutJTVl_C_ZAopt . 


78 


synthetic sequence of RT, including mutation 
resulting in inactivation of RT 


RTmut_TVl_C_ZAwt 


79 


wild type 8_5_TV1_C.ZA RT, including 
mutation resultinp' in inactivation of RT 


TatC22Exonl TV1 C ZAo 
Pt 


80 


svnthetic seauence of exon 1 of Tat indnrhnp 

O j XJLLXXwtliW OV-sU IXwlXl/w \JX \sJ\.\JxX X \JJL X CLLj XXJLwl(XVXllXc£ 

mutation resulting in non-ftmctional Tat gene 
product 


TatExonl_TVl_C_ZAopt . 


81 


synthetic sequence of exon 1 of Tat 


TatExonl_TVl_C_ZAwt 


82 


wild type 8_5 JTV1_C.ZA exon 1 of Tat 


TatExon2_TVl_C_ZAopt 


83 


synthetic sequence of exon 2 of Tat 


TatExon2_TVl_C_ZAwt 


84 


wild type 8_5_TV1_C.ZA exon 2 of Tat 


Vif_TVl_C_ZAopt 


85 


synthetic sequence of Vif 


Vif_TVl_C_ZAwt 


86 


wild type 8_5_TV1_C.ZA Vif 


Vpr_TVl_C_ZAopt 


87 


synthetic sequence of Vpr . 
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Vpr_TVl_C_ZAwt 


88 


wild type 8_5_TV1_C.ZA Vpr 


Vpu_TVl_C_ZAopt 


89 


synthetic sequence of Vpu 




90 


wild tvne 8 5 TV1 C ZA Vbu 


TM/pvnnl 9 T\71 7 A Ant 


91 


^vnthetiri senuence of exons 1 and 2 of Rev 


T?oT;T7vrtTi1 O T\/1 P 7Avyt 

Keviixon i_z_ l v i_^_z j /\wl 




wild tvne R 5 TV1 C ZA Rev (exons 1 and 2) 


TatC22Exonl_2_TVl_C_Z 

.rVAjpi 


93 


synthetic sequence of exons 1 and 2 of Tat, 
including mutation in exon 1 resulting in non- 
functional Tat gene product 


TatExonl_2_TVl_C_ZAopt 


94 


synthetic sequence of exons 1 and 2 of Tat ] 


TatExonl_2_TVl_C_ZAwt 


95 


wild type 8_5 _TV1_C.ZA Tat (exons 1 and 2) 


NefD125G- 
Myr_TVl_C_ZAopt 


96 


synthetic sequence of Nef, including mutation 
eliminating myristoylation site. 
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Table D 


Name 


Seqld 


Description 


gpl20mod.TVl.delV2 


119 


synthetic sequence of Env gpl 20, including V2 
deletion and modified leader sequences derived 
from wild-type 8_2_TV1 C.ZA sequences 


gpl40mod.TVl.delV2 


120 


synthetic sequence of Env gpl40, including V2 
deletion and modified leader sequences derived 
from wild-type 8_2_TV1_C.ZA sequences 


gpl40mod.TVl .mut7.delV2 


121 


synthetic sequence of Env gpl 40, including V2 
deletion and mutation in cleavage site and 
modified leader sequences derived from wild- 
type 8J2JTV1 C.ZA sequences 


gpl60mod.TVl.delVlV2 


122 


synthetic sequence of Env gpl 60, including 
V 1/V2 deletion and modified leader derived 
from wild-type 8_2_TV1 C.ZA sequences 


gpl60mod.TVl.delV2 


123 


synthetic sequence of Env gpl 60, including V2 
deletion and modified leader sequences derived 
from wild-type 8_2_TV1_C.ZA sequences 


gpl 60mod.TVl .mut7.delV2 


124 


synthetic sequence of Env gpl 60, including V2 
deletion; a mutation in cleavage site; and 
modified leader sequences derived from wild- 
type 8_2_TV1_C.ZA sequences 


gpl60mod.TVl.tpal 


125 


synthetic sequence of Env go 160 TPA1 leader 


gpl60mod.TVl 


126 


synthetic sequence of Env gp 1 60, including 
modified leader sequences derived from wild- 
type (8_2JTV1 J1ZA) sequences 


gpl60mod.TVl .wtLnative 


127 


synthetic sequence of Env gpl 60, including 
wild type 8_2JTV1_C.ZA (unmodified) leader 


gpl40.mod.TVl.tpal 


131 


synthetic sequence of Env gpl 40, TPA1 leader 


gpl40mod.TVl 


132 


synthetic sequence of Env gpl 40, including . 
modified leader sequences derived from wild- 
type 8_2JTV1_C.ZA sequences 


gpl40mod.TVl. wtLnative . 


133 


synthetic sequence of Env gpl 20, including 
wild type 8_2 JTV1 _C.ZA (unmodified) leader 
sequence. 



As noted above, Env-encoding constructs can be prepared using any of the full-length 
of gp 1 60 constructs. For example, a gp 140 form (SEQ ID NO: 1 32) was made by truncating 
gpl60 (SEQ ED NO: 126) at nucleotide 2064; gpl20 was made by truncating gpl60 (SEQ ID 
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NO:126) at nucleotide 1551 (SEQ ID NO:126). Additional gpl40 and gpl20 forms can be 
made using the methods described herein. One or more stop codons are typically added (e.g., 
nucleotides 2608 to 2610 of SEQ ID NO;126). Further, the wild-type leader sequence can be 
modified and/or replaced with other leader sequences (e.g, TPA1 leader sequences). 
5 Thus, the polypeptide gpl60 includes the coding sequences for gpl20 and gp41. The 

polypeptide gp41 is comprised of several domains including an oligomerization domain (OD) 
and a transmembrane spanning domain (TM). In the native envelope, the oligomerization 
domain is required for the non-covalent association of three gp41 polypeptides to form a 
trimeric structure: through non-covalent interactions with the gp41 trimer (and itself), the 

10 gpl20 polypeptides are also organized in a trimeric structure. A cleavage site (or cleavage 
sites) exists approximately between the polypeptide sequences for gpl20 and the polypeptide 
sequences corresponding to gp41. This cleavage site(s) can be mutated to prevent cleavage at 
the site. The resulting gpl40 polypeptide corresponds to a truncated form of gpl60 where the 
transmembrane spanning domain of gp41 has been deleted. This gpl40 polypeptide can exist 

15 in both monomeric and oligomeric (i. e. trimeric) forms by virtue of the presence of the 

oligomerization domain in the gp41 moiety. In the situation where the cleavage site has been 
mutated to prevent cleavage and the transmembrane portion of gp41 has been deleted the 
resulting polypeptide product is designated "mutated" gpl40 (e.g., gpl40.mut). As will be 
apparent to those in the field, the cleavage site can be mutated in a variety of ways. In the 

20 exemplary constructs described herein (e.g., SEQ ID NO:121 and SEQ ID NO:124), the 
mutation in the gpl20/gp41 cleavage site changes the wild-type amino acid sequence 
KRRWQREKR (SEQ ID NO:129) to ISSWQSEKS (SEQ ID NO:130). 

In yet other embodiments, hypervariable region(s) were deleted, N-glycosylation sites 
were removed and/or cleavage sites mutated. Exemplary constructs having variable region 

25 deletions (VI and/or V2), V2 deletes were constructed by deleting nucleotides from 

approximately 499 to approximately 593 (relative to SEQ ID NO:128) and V1/V2 deletes 
were. constructed by deleting nucleotides from approximately 375 to approximately 602 
(relative to SEQ ID NO: 128). The relative locations of VI and/or V2 regions can also be 
readily determined by alignment to the regions shown in Table A. Table E shows wild-type 

30 and synthetic sequences derived from South African HIV Type C strain 12-5_1_TV2_C.ZA. 



Table E 



Name 


SEQ ID 


Description 


Envgpl60_TV2_C_ZAopt 


97 


synthetic sequence of Env gpl60 
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10 



15 



20 



Envgpl 60_TV2_C_ZAwt 


98 


wild type 12-5_1_TV2_CZA Env gpl60. 


Gag_TV2_C_ZAopt 


99 


synthetic sequence of Gag 


Gag_TV2_C_ZAwt 


100 


wild type 12-5_1_TV2_C.ZA Gag 


Nef_TV2_C_ZAopt 


101 


synthetic sequence of Nef 


Nef_TV2_C_ZAwt 


102 


wild type 12-5_l_TV2_C.ZANef 


Pol_TV2_C_ZAopt 


103 


synthetic sequence of Pol 


Pol_TV2_C_ZAwt 


104 


wild type 12-5_1_TV2_C.ZA of Pol 


RevExonl_TV2_C_ZAopt 


105 


synthetic sequence of exon 1 of Rev 


RevExonl_TV2_C_ZAwt 


106 


wild type 12-5_1_JV2_C.ZA of exon 1 of Rev 


RevExon2_TV2_C_ZAopt 


107 


synthetic sequence of exon 2 of Rev 


RevExon2_TV2_C_ZAwt 


108 


wild type 12-5_1_TV2_C.ZA of exon 2 of Rev 


TatExonl_TV2_C_ZAopt 


109 


synthetic sequence of exon 1 of Tat 


TatExon l_TV2_C_ZAwt 


110 


wild type 12-5_l_TV2_C.ZAof exon 1 of Tat 


TatExon2_TV2_C_ZAopt 


111 


synthetic sequence of exon 2 of Tat 


TatExon2_TV2_C_ZAwt 


112 


wild type 12-5_l_TV2_C.ZA 'of exon 2 of Tat 


Vif_TV2_C_ZAopt 


113 


synthetic sequence of Vif 


Vif_TV2_C_ZAwt 


114 


wild type 12-5._1_TV2_C.ZA of Vif 


Vpr_TV2_C_ZAopt 


115 


synthetic sequence of Vpr 


Vpr_TV2_C_ZAwt 


116 


wild type 12-5_1_TV2_C.ZA of Vpr 


Vpu_TV2_C_ZAopt 


117 


synthetic sequence of Vpu 


Vpu_TV2_C_ZAwt 


118 


wild type 12-5_1_TV2_C.ZA of Vpu 



It will be readily apparent that sequences derived from any HIV type C stain or clone 
can modified as described herein in order to achieve desirable modifications in that strain. 

25 Additionally, polyproteins can be constructed by fusing in-frame two or more polynucleotide 
sequences encoding polypeptide or peptide products. Further, polycistronic coding sequences 
may be produced by placing two or more polynucleotide sequences encoding polypeptide 
products adjacent each other, typically under the control of one promoter, wherein each 
polypeptide coding sequence may be modified to include sequences for internal ribosome 

30 binding sites. 
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The sequences of the present invention, for example, the modified (synthetic) 
polynucleotide sequences encoding HTV polypeptides, may be modified by deletions, point 
mutations, substitutions, frame-shifts, and/or further genetic modifications (for example, 
mutations leading to inactivation of an activity associated with a polypeptide, e.g., mutations 
that inactivate protease, tat, or reverse transcriptase activity). Such modifications are taught 
generally in the art and may be applied in the context of the teachings of the present 
invention. For example, sites corresponding to the <c Regions of the HTV Genome" listed in 
Table A may be modified in the corresponding regions of the novel sequences disclosed 
herein in order to achieve desirable modifications. Further, the modified (synthetic) 
polynucleotide sequences of the present invention can be combined for use, e.g., in an 
composition for generating. an immune response in a subject, in a variety of ways, including 
but not limited to the following ways: multiple individual expression cassettes each 
comprising one polynucleotide sequence of the present invention (e.g., a gag-expression 
cassette, an env expression cassette, and a rev expression cassette, or a pol-expression 
cassette, a vif expression cassette, and a vpr expression cassette, etc.); polyproteins produced 
by in-frame fusions of multiple polynucleotides of the present invention, and polycistronic 
polynucleotides produced using multiple polynucleotides of the present invention. 

Example 2 

Expression Assays for the Synthetic Coding Sequences 
A, Type C HIV Coding Sequences 

The wild-type Subtype C HTV coding (for example from AF1 10965, AF1 10967, 
AF1 10968, AF1 10975, as well as novel South African strains 8_5_TV1_C.ZA, 
8_2JTV1_C.ZA and 12-5_1_TV2_C.ZA) sequences are cloned into expression vectors 
having the same features as the vectors into which the synthetic sequences are cloned 

Expression efficiencies for various vectors carrying the wild-type and synthetic 
sequences are evaluated as follows. Cells from several mammalian cell lines (293, RD, COS- 
7, and CHO; all obtained from the American Type Culture Collection, 10801 University 
Boulevard, Manassas, VA 20110-2209) are transfected with 2 ^g of DNA in transfection 
reagent LT1 (PanVera Corporation, 545 Science Dr., Madison, WI). The cells are incubated 
for 5 hours in reduced serum medium (Opti-MEM, Gibco-BRL, Gaithersburg, MD). The 
medium is then replaced with normal medium as follows: 293 cells, IMDM, 10% fetal calf 
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serum, 2% glutamine (BioWhittaker, Walkersville, MD); RD and COS-7 cells, D-MEM, 10% 
fetal calf serum, 2% glutamine (Opti-MEM, Gibco-BRL, Gaithersburg, MD); and CHO cells, 
Ham's F-12, 10% fetal calf serum, 2% glutamine (Opti-MEM, Gibco-BRL, Gaitbersbuxg, 
MD). The cells are incubated for either 48 or 60 hours. Cell lysates are collected as 
5 described below in Example 3. Supernatants are harvested and filtered through 0.45 |im 
syringe filters. Supernatants are evaluated using the using 96-well plates coated with a 
murine monoclonal antibody directed against HTV antigen, for example a Coulter p24-assay 
(Coulter Corporation, Hialeah, FL, US). The HIV-1 antigen binds to the coated wells. 
Biotinylated antibodies against HIV recognize the bound antigen. Conjugated strepavidin- 

10 horseradish peroxidase reacts with the biotin. Color develops from the reaction of peroxidase 
with TMB substrate. The reaction is terminated by addition of 4N H 2 S0 4 . The intensity of 
the color is directly proportional to the amount of HIV antigen in a sample. 

Synthetic HTV Type C expression cassettes provides dramatic increases in production 
of their protein products, relative to the native (wild-type Subtype C) sequences, when 

15 expressed in a variety of cell lines. 

IL Signal Peptide Leader Sequences 

The ability of various leader sequences to drive expression was tested by transfecting 
cells with wild type or synthetic Env-encoding expression cassettes operably linked to 
20 different leader sequences and evaluating expression of Env polypeptide by ELISA or 
Western Blot. The amino acid and nucleotide sequence of various signal peptide leader 
sequences are showii in Table 4. 



Table 4 



Leader 


Amino acid sequence 


DNA sequence 


WTnative 
(8 2 TV 
1_C.ZA) 


MRVMGTQKNCQQWWIWGI 
LGFWMLMIC 


ATGAGAGTGATGGGGACACAGA 
AGAATTGTCAACAATGGTGGATA 
TGGGGCATCTTAGGCTTCTGGAT 
GCTAATGATTTGT 


WTmod 
(8 2 TV 
1_C.ZA) 


MRVMGTQKNCQQWWIWGI 
LGFWMLMIC 


ATGCGCGTGATGGGCACCCAGAA 
GAACTGCCAGCAGTGGTGGATCT 
GGGGCATCCTGGGCTTCTGGATG 
CTGATGATCTGC 


Tpal 


MD AMKRGLCC VLLLCGA VFVSP S 
AS 


ATGGATGCAATGAAGAGAGGGC 
TCTGCTGTGTGCTGCTGCTGTGTG 



93 



WO 02/04493 



PCTAJS01/21241 







GAGCAGTCTTCGTTTCGCCCAGC 
GCCAGC 


Tpa2 


MDAMKRGLCCVLLLCGAVFVSPS 


ATGGATGCAATGAAGAGAGGGC 

TCTGCTGTGTGCTGCTGCTGTGTG 

GAGCAGTCTTCGTTTCGCCCAGC 



35 293 cells were transiently transfected using standard methods with native and 

sequence-modified constructs encoding the gpl20 and gpl40 forms of the 8_2_TV1_C.ZA 
(TVlc8.2) envelope. Env protein was measure in cell lysates and supernatants using an in- 
house Env capture ELISA. Results are shown in Table 5 below and indicate that the wild- 
type signal peptide leader sequence of the TVlc8.2 can be used to efficiently express the 

40 encoded envelope protein to levels that are better or comparable to those observed using the 
heterologous tpa leader sequences. Furthermore, the TVlc8.2 leader works in its native or 
sequence-modified forms and can be used with native or sequence-modified env genes. All 
constructs were tested after cloning of the gene cassettes into the EcoRl and Xhol sites of the 
pCMVlink expression vector. 

45 . 
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Table 5 



TVlc8.2 construct 


Supernatant (ng) 


Lysate (ng) 


Total (ng) 


gpl40nat.wtL 


. 532 


149 


681 


gpl40nat.tpal 


250 


20 


270 


gpl40nat.tpa2 


1 192 


34 


226 . 


gp 1 20mod.wtLmod 


6186 


4576 


10762 


gpl20mod.tpal 


6932 


3808 


10740 


gp 1 20mod.wtLnat 


6680 


4174 


10854 


gp 140med.wtLmod 


1844 


8507 


10351 


gpl40mod.tpal 


1854 


2925 


4779 


gpl40mod.wtLnat 


1532 


. 3015 


4547 



The sequence-modified TVlc8.2 envelope variant gene cassettes were subcloned into 
15 a Chiron pCMV expression vector for the derivation of stable mammalian cell lines. Stable 
CHO cell lines expressing the TVlc8.2 envelope proteins were derived using standard 
methods of transfection, methotrexate amplification, and screening. These cell lines were 
found to secrete levels of envelope protein that were comparable to those observed for 
proteins expressed using the tpa leader sequences. Representative results are shown in Table 
20 6 for two cell line clone expressing the TVlc8.2 gpl20; they are compared to two reference 
clones expressing SF162 subtype B gpl20 derived in a similar fashion but using the tpa 
leader. Protein concentrations were determined following densitometry of scanned gels of 
semi-purified proteins. Standard curves were generated using a highly purified and well- 
characterized preparation of SF2 gpl20 protein and the concentrations of the test proteins 
25 were determined. 



Table 6 



CHO cell line 


Clone # 


Expression 
(ng/ml) 


gpl20 SF162 


Clone 65 


921 . 




Clone 71 


972 


gpl20TVl.C8.2 


Clone 159 


1977 




Clone 210 


1920 



The results were also confirmed by Western Blot Analysis, essentially as described in 
Example 3. 

35 
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Example 3 
Western Blot Analysis of Expression 
A. HTV Type C Coding Sequences 

Human 293 cells are transfected as described in Example 2 with pCMV-based vectors 
containing native or synthetic HTV Type C expression cassettes. Cells are cultivated for 60 
hours post-transfection. Supernatants are prepared as described. Cell lysates are prepared as 
follows. The cells are washed once with phosphate-buffered saline, lysed with detergent [1% 
NP40 (Sigma Chemical Co, St. Louis, MO) in 0.1 M Tris-HCl, pH 7.5], and the lysate 
transferred into fresh tubes. SDS-polyacrylamide gels (pre-cast 8-16%; Novex, San Diego, 
CA) are loaded with 20 \i\ of supernatant or 12.5 ]i\ of cell lysate. A protein standard is also 
loaded (5 jd, broad size range standard; BioRad Laboratories, Hercules, CA). 
Electrophoresis is carried out and the proteins are transferred using a BioRad Transfer 
Chamber (BioRad Laboratories, Hercules, CA) to Immobilon P membranes (Millipore Corp., 
Bedford, MA) using the transfer buffer recommended by the manufacturer (Millipore), where 
the transfer is performed at 100 volts for 90 minutes. The membranes are exposed to HTV-1- 
positive human patient serum and immunostained using o-phenylenediamdne dihydrochloride 
(OPD; Sigma). 

Immunoblotting analysis shows that cells containing the synthetic expression cassette 
produce the expected protein at higher per-cell concentrations than cells containing the native 
expression cassette. The proteins are seen in both cell lysates and supernatants. The levels of 
production are significantly higher in cell supernatants for cells transfected with the synthetic 
expression cassettes of the present invention. 

In addition, supernatants from the transfected 293 cells are fractionated on sucrose 
gradients. Aliquots of the supernatant are transferred to Polyclear™ ultra-centrifuge tubes 
(Beckman Instruments, Columbia, MD), under-laid with a solution of 20% (wt/wt) sucrose, 
and subjected to 2 hours centrifugation at 28,000 rpm in a Beckman SW28 rotor. The . 
resulting pellet is suspended in PBS and layered onto a 20-60% (wt/wt) sucrose gradient and 
subjected to 2 hours centrifugation at 40,000 rpm in a Beckman SW41ti rotor. 

The gradient is then fractionated into approximately 10 x 1 ml aliquots (starting at the 
top, 20%-end, of the gradient). Samples are taken from fractions 1-9 and are electrophoresed 
on 8-16% SDS polyacrylamide gels. The supernatants from 293/synthetic cells give much 
stronger bands than supernatants from 293/native cells. 
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Example 4 

In Vivo Immunogenicity of Synthetic HIV Type C Expression Cassettes 
A, Immunization 

To evaluate the possibly improved immunogenicity of the synthetic HIV Type C 
5 expression cassettes, a mouse study is performed. The plasmid DNA, pCMVKM2 carrying 
the synthetic Gag expression cassette, is diluted to the following final concentrations in a 
total injection volume of 100 [il: 20 \ig, 2 jig, 0.2 ng, 0.02 and 0.002 jig. To overcome 
possible negative dilution effects of the diluted DNA, the total DNA concentration in each 
sample is brought up to 20 ng using the vector (pCMVKM2) alone. As a control, plasmid 
1 0 DNA of the native Gag expression cassette is handled in the same maimer. Twelve groups of 
four to ten Balb/c mice (Charles River, Boston, MA) are intramuscularly immunized (50 jxl 
per leg, intramuscular injection into the tibialis anterior) according to the schedule in Table 
1. 
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Table 1 



Group 


Gagor Env Expression 
Cassette 


Concentration of Gag or 
Env plasmid DNA (ug) 


Immunized at time 
(weeks): 


1 


Synthetic 


20 


0\4 


2 


Synthetic 


2 


0,4 


3 


Synthetic 


0.2 


0,4 


4 


Synthetic 


0.02 


0,4 


5 


Synthetic 


0.002 


0,4 


6 


Synthetic 


20 


0 


7 


Synthetic 


2 


0 


g 


Svnthetic 


0.2 


0 


9 


Synthetic 


0.02 


■ 0-_. 


10 


Synthetic 


0.002 


0 


11 


Native 


20 


0,4 


12 


Native 


2 


0,4 


13 


Native 


0.2 


0,4 


14 


Native 


0.02 


0,4 


15 


Native 


0.002 


0,4 


16 


Native 


20 


0 


17 


Native 


2 


0 


18 


Native 


0.2 


0 


19 


Native 


0.02 


0 




Native 


0.002 


0 



1 = initial immunization at "week 0" 



Groups 1-5 and 11-15 are bled at week 0 (before immunization), week 4, week 6, 
week 8, and week 12. Groups 6-20 and 16-20 are bled at week 0 (before immunization) and 
at week 4. 

B. Prnmrn-al Im mune Response 

The humoral immu ne response is checked with an anti-HTV antibody ELISAs 
(enzyme-linked immunosorbent assays) of the mice sera 0 and 4 weeks post immunization 
(groups 5-12) and, in addition, 6 and 8 weeks post immunization, respectively, 2 and 4 weeks 
post second immunization (groups 1-4). 
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The antibody titers of the sera are determined by using the appropriate anti-HTV 
polypeptide (e.g., anti-Pol, anti-Gag, anti-Env, anti-Vif, anti-Vpu, etc.) antibody ELISA. 
Briefly, sera from immunized mice are screened for antibodies directed against the HIV 
proteins (e.g., p55 Gag protein, an Env protein, e.g., gpl60 or gpl20 or a Pol protein, e.g., p 6, 
5 prot or RT, etc). ELISA microtiter plates are coated with 0.2 fig of HIV protein per well 
overnight and washed four times; subsequently, blocking is done with PBS-0.2% Tween 
(Sigma) for 2 hours. After removal of the blocking solution, 100 jil of diluted mouse serum 
is added. Sera are tested at 1/25 dilutions and by serial 3-fold dilutions, thereafter. Microtiter 
plates are washed four times and incubated with a secondary, peroxidase-coupled anti-mouse 

10 IgG antibody (Pierce, Rockford, EL). ELISA plates are washed and 100 \xl of 3, 3', 5, 5 r - 
tetramethyl benzidine (TMB; Pierce) is added per well. The optical density of each well is 
measured after 15 minutes. The titers reported are the reciprocal of the dilution of serum that 
gave a half-maximum optical density (O.D.). 

Synthetic expression cassettes will provide a clear improvement of immunogenicity 

15 relative to the native expression cassettes. 

C. Cellular Immune Response 

The frequency of specific cytotoxic ^lymphocytes (CTL) is evaluated by a standard 
chromium release assay of peptide pulsed mouse (Balb/c, CB6F1 and/or C3H) CD4 cells. 

20 HIV polypeptide (e.g., Pol, Gag or Env) expressing vaccinia virus infected CD-8 cells are 

used as a positive control. Briefly, spleen cells (Effector cells, E) are obtained from the mice 
immunized as described above are cultured, restimulated, and assayed for CTL activity 
against Gag peptide-pulsed target cells as described (Doe, B., and Walker, CM., AIDS 
10(7):793-794, 1996). Cytotoxic activity is measured in a standard 51 Cr release assay. Target 

25 (T) cells are cultured with effector (E) cells at various E:T ratios for 4 hours and the average 
cpm from duplicate wells are used to calculate percent specific 51 Cr release. 

Cytotoxic T-cell (CTL) activity is measured in splenocytes recovered from the mice 
immunized with HIV Gag or Env DNA. Effector cells from the Gag or Env DNA- 
immunized animals exhibit specific lysis of HIV polypeptide-pulsed SV-BALB (MHC 

30 matched) targets cells, indicative of a CTL response. Target cells that are peptide-pulsed and 
derived from an MHC-unmatched mouse strain (MC57) are not lysed. 



99 



WO 02/04493 



PCTAJS01/21241 



Thus, synthetic expression cassettes exhibit increased potency for induction of 
cytotoxic T-lymphocyte (CTL) responses by DNA immunization. 

Example 5 

niMA-imnn unization of Non-Human Primates Using a 
Synthetic HIV Type C Expression Cassette 
Non-human primates are immunized multiple times (e.g., weeks 0, 4, 8 and 24) 
intradermally, mucosally or bilaterally, intramuscular, into the quadriceps using various 
doses {e.g., 1-5 mg) and various combinations of synthetic HIV Type C plasmids. The 
animals are bled two weeks after each immunization and ELISA is performed with isolated 
plasma. The ELISA is performed essentially as described in Example 4 except the second 
antibody-conjugate is an anti-human IgG, g-chain specific, peroxidase conjugate (Sigma 
Chemical Co., St. Louis, MD 63178) used at a dilution of 1 :500. Fifty ng/ml yeast extract is 
added to the dilutions of plasma samples and antibody conjugate to reduce non-specific 
background due to preexisting yeast antibodies in the non-human primates. - 

Further, lymphoproliferative responses to antigen can also be evaluated post- 
immunization, indicative of induction of T-helper cell functions. 

Synthetic plasmid DNA are expected to be immunogenic in non-human primates. 

Example 6 

In vitro expression of recombinant Sindbis RNA and DNA 
containing the synthetic HIV Type C expression cassette 
To evaluate the expression efficiency of the synthetic Pol, Env and Gag 
expression cassette in Alphavirus vectors, the selected synthetic expression cassette is 
subcloned into both plasmid DNA-based and recombinant vector particle-based Sindbis virus 
vectors. Specifically, a cDNA vector construct for in vitro transcription of Sindbis virus 
RNA vector replicons (pRSIN-luc; Dubensky, et al, J Virol 70:508-519, 1996) is modified 
to contain zPmel site for plasmid linearization and a polylihker for insertion of heterologous 
genes. A polylinker is generated using two oligonucleotides that contain the sites Xhol, PmR, 
Apal, Narl, Xbal, and Notl (XPANXNF, and XPANXNR). 

The plasmid pRSIN-luc (Dubensky et al, supra) is digested with XIiol and Notl to 
remove the luciferase gene insert, blunt-ended using Klenow and dNTPs, and purified from 
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an agarose get using GeneCleanH (BiolOl, Vista, CA). The oligonucleotides are annealed to 
each other and ligated into the plasmid. The resulting construct is digested with NotI and 
Sad to remove the minimal Sindbis 3 f -end sequence and A 40 tract, and ligated with an 
approximately 0.4 kbp fragment from PKSSIN1-BV (WO 97/38087). This 0.4 kbp fragment 
5 is obtained by digestion of pKSSINl-BV with NotI and Sad, and purification after size 

fractionation from an agarose gel. The fragment contains the complete Sindbis virus 3'-end, 
an A 40 tract and a Pmel site for linearization. This new vector construct is designated 
SINBVE. 

The synthetic HIV coding sequences are obtained from the parental plasmid by - 

10 digestion with EcoRI, blunt-ending with Klenow and dNTPs, purification with GeneCleanll, 
digestion with Sail, size fractionation on an agarose gel, and purification from the agarose gel 
using GeneCleanll. The synthetic HIV polypeptide-coding fragment is ligated into the 
SINBVE vector that is digested with Xfiol and Pmtl. The resulting vector is purified using 
.GeneCleanH and is designated SINBVGag. Vector RNA replicons may be transcribed in 

1 5 vitro (Dubensky et al., supra) from SINBVGag and used directly for transfection of cells. 
Alternatively, the replicons may be packaged into recombinant vector particles by co- 
transfection with defective helper RNAs or using an alphavirus packaging cell line. 

The DNA-based Sindbis virus vector pDCMVSIN-beta-gal (Dubensky, et al., J Virol 
70:508-5 19, 1996) is digested with Sail and Xbal, to remove the beta-galactosidase gene 

20 insert, and purified using GeneCleanll after agarose gel size fractionation. The HIV Gag or 
Env gene is inserted into the pDCMVSIN-beta-gal by digestion of SINBVGag with Sail and 
Xhol, purification using GeneCleanll of the Gag-containing fragment after agarose gel size 
fractionation, and ligation. The resulting construct is designated pDSIN-Gag, and may be 
used directly for in vivo administration or formulated using any of the methods described 

25 herein. 

BHK and 293 cells are transfected with recombinant Sindbis RNA and DNA, 
respectively. The supernatants and cell lysates are tested with the Coulter capture ELISA 
(Example 2). 

BHK cells are transfected by electroporation with recombinant Sindbis RNA. 
30 293 cells are transfected using LT-1 (Example 2) with recombinant Sindbis DNA. 

Synthetic Gag- and/or Env-containing plasmids are used as positive controls. Supernatants 
and lysates are collected 48h post transfection. 
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Type C HIV proteins can be efficiently expressed from both DNA and RNA-based 
Sindbis vector systems using the synthetic expression cassettes. 

Example 7 

5 In Vivo Immunogenicitv of recombinant Sindbis Replicon Vectors 

containing synthetic PoL Gag and/or Env Expression Cassettes 
A. Immunization 

To evaluate the immunogenicity of recombinant synthetic HTV Type C expression 
cassettes in Sindbis replicons, a mouse study is performed. The Sindbis virus DNA vector 

10 carrying synthetic expression cassettes (Example 6), is diluted to the following final 

concentrations in a total injection volume of 100 ^1: 20 ng, 2 \ig, 0.2 ng, 0.02 and 0.002 (ig. 
To overcome possible negative dilution effects of the diluted DNA, the total DNA 
concentration in each sample is brought up to 20 \ig using the Sindbis replicon vector DNA 
alone. Twelve groups of four to ten Balb/c mice (Charles River, Boston, MA) are 

15 intramuscularly immunized (50 p.1 per leg, intramuscular injection into the tibialis anterior) 
according to the schedule in Table 2. Alternatively, Sindbis viral particles are prepared at the 
following doses: 10 3 pfu, 10 s pfu and 10 7 pfu in 100 \il 9 as shown in Table 3. Sindbis HTV 
polypeptide particle preparations are administered to mice using intramuscular and 
subcutaneous routes (50 ^1 per site). 

20 
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Table 2 



Group 


Gag or Env 
Expression Cassette 


Concentration of Gag 
or Env DNA (ug) 


Immunized at time 
(weeks): 


1 


Synthetic 


20 


0',4 


2 


Synthetic 


2 


0,4 


3 


Synthetic 


0.2 


0,4 


4 


Synthetic 


0.02 


0,4 


5 


CI Xl A* 

Synthetic 


0.002 


0,4 


6 


Synthetic 


20 


0 


7 


Synthetic 


2 


0 




Synthetic 


0.2 


0 


9 


Synthetic 


0.02 


0 


10 


Synthetic 


0.002 


0 



1 = initial immunization at "week 0" 



Table 3 



Group 


Gag or Env sequence 


Concentration of viral 
particle (pfu) 


Immunized at time 
(weeks): 


1 


Synthetic 


10 3 


0',4 


2 


Synthetic 


10 s 


0,4 . 


3 


Synthetic 


10 7 


0,4 1 


8 


Synthetic 


10 3 


0 


9 


Synthetic 


10 s 


0 


10 


Synthetic 


10 7 


0 



1 = initial irnmunization at "week 0" 



Groups are bled and assessment of both humoral and cellular (e.g., frequency of 
specific CTLs) is performed, essentially as described in Example 4. 

Example 8 

Identificat ion and Seq uencing of a Novel HIV Type C Variants 
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A full-length clone, called 8_5_TV1_C.ZA, encoding an HIV Type C was isolated 
and sequenced. Briefly, genomic DNA from HIV-1 subtype C infected South African 
patients was isolated from PBMC (peripheral blood mononuclear cells) by alkaline lysis and 
anion-exchange columns (Quiagen). To get the genome of full-length clones two halves were 
amplified, that could later be joined together in frame within the Pol region using an unique 
Sal 1 site in both fragments. For the amplification, 200-800 ng of genomic DNA were added 
to the buffer and enzyme mix of the Expand Long Template PCR System after the protocol of 
the manufacturer (Boehringer Mannheim): The primer were designed after alignments of 
known full length sequences. For the 5 'half a primer mix of 2 forward primers containing 
either thymidine (SIFCSacTA 5 '-GTTTCTTGAGCTCTGGAAGGGTTAATTTAC 
TCCAAGAA-3', SEQ ID NO:38) or cytosine on position 20 (SIFTSacTA 5'- 
GTTTCTTGAGCTCTGGAAGGGTTAATTTACTCTAAGAA, SEQ ID NO:39) plus Sal 1 
site, were used. The reverse primer were also a mix of two -primers with either thymidine or 
cytosine on position 13 (S145RTSalTA 5'- 

GTTTCTTGTCGACTTGTCCATGTATGGCTTCCCC T-3', SEQ ID NO:40 and 
S145RCSalTA 5 '-GTTTCTTGTCGACTTGTCCATGCATGGCTTCCCT-S * SEQ ID 
NO:41) and contained a Sal 1 site. The forward primer for the 3 'half was also a mixture of 
two primers (S245FASalTA 5 ' -GTTTCTTGTCGACTGT AGTCCAGGaAT ATGGC AAT 
TAG-3' SEQ ID NO:42 and S245FGSalTA 5'- 

GTTTCTTGTCGACTGTAGTCCAGGgATATG GCAA TTAG-3' SEQ ID NO:43) with Sal 
1 site and adenine or guanine on position 12. The reverse primer had a Not 1 site 
(S2J?ulINotTA 5'-GTTTCTTGCGGCCGCTGCTAGA GATTTTCCACACTACCA-3' SEQ 
ID NO:44). After amplification the PCR products were purified using a 1% agarose gel and 
cloned into the pCR-XL-TOPO vector via TA cloning (Invitrogen). Colonies were checked 
by restriction analysis and sequence verified. For the full length sequence the sequences of 
the 5'- and 3 'half were combined. The sequence is shown in SEQ ID NO:33. Furthermore, 
important domains are shown in Table A. 

Another clone, designated 12-5_1_TV2_CZA was also sequenced and is shown in 
SEQ ID NO:45. The domains can be readily determined in view of the teachings of the 
specification, for example by aligning the sequence to those shown in Table A to find the 
corresponding regions in clone 12-5_1_TV2_C.ZA. 
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As described above (Example 1, Table C), synthetic expression cassettes were 
generated using one or more polynucleotide sequences obtained from 8_5_TV1_C.ZA or 12- 
5J_TV2_C.ZA. 

The polynucleotides described herein have all been deposited at Chiron Corporation, 
Emeryville, CA. 

Although preferred embodiments of the subject invention have been described in 
some detail, it is understood that obvious variations can be made without departing from the 
spirit and the scope of the invention as defined by the appended claims. 
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Claims 

1 . An expression cassette comprising 

a polynucleotide sequence encoding a polypeptide including an HIV Pol polypeptide, 
wherein the polynucleotide sequence encoding said Pol polypeptide comprises a sequence 
having at least 90% sequence identity to the sequence presented of Figure 8 (SEQ ID NO:30); 
Figure 9 (SEQ ID NO:31) or Figure 10 (SEQ ID NO:32). 

2. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:46, 
(ii) X equals Y, and (iii) Y is at least 97. 

3. The expression cassette of claim 2, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ED NO:47, 
(ii) X equals Y, and (iii) Y is at least 144. 

4. The expression cassette of claim 3, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:49 
or SEQ ID NO:97, (ii) X equals Y, and (iii) Y is at least 300. 

5. The expression cassette of claim 4, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:49, 
(ii) X equals Y, and (iii) Y is 2610. 

6. The expression cassette of claim 4, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:97, 
(ii) X equals Y, and (iii) Y is 2565. \ 
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7. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:51 
5 (ii) X equals Y, and (iii) Y is 1494. . 

8. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:99, 
10 (ii) X equals Y, and (iii) Y is 1491. 

9. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:55; 
15 SEQ ID NO:57; SEQ ID NO:101; SEQ ID NO:96; SEQ ID NO:134 or SEQ ID NO:135, (ii) 
X equals Y, and (iii) Y is at least 60. 

10. The expression cassette of claim 9, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
20 nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:55; 
SEQIDNO:57; SEQIDNO:101; SEQIDNO:96; SEQIDNO:134 or SEQIDNO:135, (ii) 
X equals Y, and (iii) Y is 624. 

11. An expression cassette comprising 

25 a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 

nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO: 5 8; 
(ii) X equals Y, and (iii) Y is 354. 

12. An expression cassette comprising 

30 a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 

nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ DD NO:60; 
(ii) X equals Y, and (iii) Y is 876. 
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13. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:62; 
(ii) X equals Y, and (iii) Y is 3015. 

14. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:103; (ii) X equals Y, and (iii) Y is 3009. 

15. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:64 
or SEQ ID NO:66; (ii) X equals Y, and (iii) Y is 297. 

16. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ- ID NO:68, 
"(ii) X equals Y, and (iii) Y is 1965. 

17. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:70; 
(ii) X equals Y, and (iii) Y is 1 977. 

1 8. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:72 
or SEQ ID NO:105, (ii) X equals Y, and (iii) Y is at least 30. 

19. The expression cassette of claim 18, comprising 
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a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:72 
or SEQ ID NO:105; (ii) X equals Y, and (iii) Y is 75. 

20. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) thaX contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:74 
or SEQ ID NO:107, (ii) X equals Y, and (iii) Y is at least 30. 

21 . The expression cassette of claim 20, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:74 
or SEQ ID NO: 1 07; (ii) X equals Y, and (iii) Y is 246. 

22. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:76; 
(ii) X equals Y, and (iii) Y is 1680. 

23. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:78; 
(ii) X equals Y, and (iii) Y is 1668. 

24. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:80, 
SEQ ID NO;81 or SEQ ID NO:109; (ii) X equals Y, and (iii) Y is 216. 

25. An expression cassette comprising 
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a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:83; 
(ii) X equals Y, and (iii) Y is 93. 

26. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:ll 1; (ii) X equals Y, and (iii) Y is 90. 

27. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides; wherein (i) the X contiguous 
- nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO: 85, 
or SEQ ID NO:l 13; (ii) X equals Y, and (iii) Y is 579. 

28. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:87; 
(ii) X equals Y, and (iii) Y is 288. 

29. An expression cassette comprising . 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:115; (ii) X equals Y, and (iii) Y is 287. 

30. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:89 
or SEQ ID NO:117; (ii) X equals Y, and (iii) Y is at least 30. 

31. The expression cassette of claim 30 comprising 
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a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:89; 
(ii) X equals Y, and (iii) Y is 267. 

5 32. The expression cassette of claim 30 comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ED 
NO:117; (ii) X equals Y, and (iii) Y is 261. 

10 33. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:91; 
(ii) X equals Y, and (iii) Y is at least 30. 

1 5 34. The expression cassette of claim 33 comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:91; 
(ii) X equals Y, and (iii) Y is 321. 

20 35. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:93 
or SEQ ID NO:94; (ii) X equals Y, and (iii) Y is 309. 

25 36. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:96; 
(ii) X equals Y, and (iii) Y is at least 60. 

30 37. The expression cassette of claim 36 comprising 
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a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID NO:96; 
(ii) X equals Y, and (iii) Y is 624. 

5 38. An expression cassette comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:l 19, SEQ ID NO.120; SEQ ID NO:121; SEQ ID NO:122; SEQ ID NO: 123; SEQ ID 
NO:124; SEQ ID NO:125; SEQ ID NO: 126; SEQ ED NO:127; SEQ ID NO:131; SEQ ID 
10 NO:132 or SEQ ID NO:133, (ii) X equals Y, and (iii) Y is at least 60. 

39. The expression cassette of claim 38, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
15 NO:l 19, SEQ ID NO:120; SEQ ED NO:121; SEQ ID NO:122; SEQ ID NO:123; SEQ ID 
NO:124; SEQ ID NO:125; SEQ ID NO:126; SEQ ED NO:127; SEQ ID NO:131; SEQ ED 
NO:132 or SEQ ID NO:133,(ii) X equals Y, and (iii) Y is at least 300. 

40. The expression cassette of claim 39, v comprising 

20 a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 

nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ED 
N0.123 or SEQ ID NO:124, (ii) X equals Y, and (iii) Y is 2433. 

41. The expression cassette of claim 39, comprising 

25 a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 

nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ED 
NO:122, (ii) X equals Y, and (iii) Y is 2301. 

42. The expression cassette of claim 39, comprising 

30 a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 

nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:125; (ii) X equals Y, and (iii) Y is 2517. 
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43. The expression cassette of claim 39, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:126 or SEQ ID NO:127, (ii) X equals Y, and (iii) Y is 2520. 

44. The expression cassette of claim 39, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:119, (ii) X equals Y,.and (iii) Y is 1377. 

45. The expression cassette of claim 39, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:120 or SEQ ID NO: 121, (ii) X equals Y, and (iii) Y is 1 839. 

46. The expression cassette of claim 39, comprising 

a polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous 
nucleotides have at least 90% percent identity to Y contiguous nucleotides of SEQ ID 
NO:132 or SEQ ID NO:133, (ii) X equals Y, and (iii) Y is 1890. 

47. A polynucleotide comprising the sequence depicted in SEQ ID NO:33 or 
fragments derived therefrom. 

48. The polynucleotide of claim 47, wherein said fragments comprise coding 
sequence for the gene products selected from the group consisting of Gag, Pol, Vif, Vpr, Tat, 
Rev, Vpu, Env and Nef. 

49. The polynucleotide of claim 48, wherein the fragment comprises a Gag gene 
product. 

The polynucleotide of claim 48, wherein the fragment comprises an Env gene 



50. 
product. 
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51. The polynucleotide of claim 50, wherein the Env gene product is gpl60, gpl40 or 

gpl20. 

52. A polynucleotide comprising the sequence depicted in SEQ ID NO:45 or 
fragments derived therefrom. 

53. The polynucleotide of claim 52, wherein said fragments comprise coding 
sequence for the gene products selected from the group consisting of Gag, Pol, Vif, Vpr, Tat, 
Rev, Vpu, Env and Nef. 

54. The polynucleotide of claim 53, wherein the fragment comprises a Gag gene 
product. 

55. The polynucleotide of claim 53, wherein the fragment comprises an Env gene 
product. 

56. The polynucleotide of claim 55, wherein the Env gene product is gpl60, gpl40 or 
gpl20. . 

57. A polynucleotide comprising the sequence depicted in SEQ ID NO: 128 or 
fragments derived therefrom. 

58. The polynucleotide of claim 57, wherein the fragments comprise coding sequence 
for Env gene products gpl60, gpl40 or gpl20. 

59. The expression cassette of any of claims 1 to 46, further comprising one or more 
nucleic acids encoding one or more viral polypeptides or antigens. 

60. The expression cassette of claim 59, wherein the viral polypeptide or antigen is 
selected from the group consisting of Gag, Env, vif, vpr, tat, rev, vpu, nef and combinations 
thereof. 
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61. The expression cassette of any of claims 1 to 46, further comprising one or more 
nucleic acids encoding one or more cytokines. 

62. A recombinant expression system for use in a selected host cell, comprising, an 
5 expression cassette of any of claims 1 to 46, and wherein said polynucleotide sequence 

further comprises control elements capable of driving expression in the selected host cell. 

63. The recombinant expression system of claim 62, wherein said control elements 
are selected from the group consisting of a transcription promoter, a transcription enhancer 

10 element, a transcription termination signal, polyadenylation sequences, sequences for 
optimization of initiation of translation, and translation termination sequences. 

64. The recombinant expression system of claim 62 wherein said transcription 
promoter is selected from the group consisting of CMV, CMV+intron A, SV40, RS V, HIV- 

15 Ltr, MMLV-ltr, and metallothionein. 

65. A cell comprising an expression cassette of any of claims 1 to 46, and wherein 

" said polynucleotide sequence further comprises control elements compatible with expression 
in the selected cell. 

20 

66. The cell of claim 65, wherein the cell is selected from the group consisting of a 
mammalian cell, an insect cell, a bacterial cell, a yeast cell, a plant, an antigen presenting cell, 
a primary cell, an immortalized cell, and a tumor derived cell. 

25 67. The cell of claim 66, wherein the cell is selected from the group consisting of 

BHK, VERO, HT1 080, 293, RD, COS-7, and CHO cells. 

68. • The cell of claim 67, wherein said cell is a CHO cell. 

30 69. The cell of claim 66, wherein the cell is either Trichoplusia ni (Tn5) or Sf9 insect 

cells. 
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70. The cell of claim 66, wherein the antigen presenting cell is a lymphoid cell 
selected from the group consisting of macrophage, monocytes, dendritic cells, B-cells, T- 
cells, stem cells, and progenitor cells thereof. 

71 . A composition for generating an immunological response, comprising an 
expression cassette of any of claims 1 to 46. 

72. The composition of claim 71, further comprising one or more Pol polypeptides.^ 

73. The composition of claim 72, further comprising an adjuvant. 

74. A composition for generating an immunological response, comprising an 
expression cassette of claim 52. 

- 75. The composition of claim 74, further comprising a Pol polypeptide. 

76. The composition of claim 74, further comprising one or more polypeptides 
encoded by the nucleic acid molecules of claim 60. 

77. The composition of claim 76, further comprising an adjuvant. 

78. A method of immunization of a subject, comprising, 

introducing a composition of claim 71 into said subject under conditions that are 
compatible with expression of said expression cassette in said subject. 

79. The method of claim 78, wherein said expression cassette is introduced using a 
gene delivery vector. 

80. The method of claim 79, wherein the gene delivery vector is a non-viral vector. 

8 1 . The method of claim 79, wherein said gene delivery vector is a viral vector. 
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82. The method of claim 79, wherein said gene delivery vector is selected from the 
group consisting of an adenoviral vector, a vaccinia viral vector, an AAV vector, a retroviral 
vector, a lentiviral vector and an alphaviral vector. 

5 83. The method of claim 82, wherein said gene delivery vector is a Sindbis-virus 

derived vector. 

84. The method of claim 82, wherein said gene delivery vector is a cDNA vector. 

10 85. The method of claim 82, wherein said gene delivery vector is a eukaryotic layered 

viral initiation system (ELVIS). 

86. The method of claim 79, wherein said composition delivered using a particulate 

carrier. 

15 " 

87. The method of claim 79, wherein said composition is coated on a gold or tungsten 
particle and said coated particle is delivered to said subject using a gene gun. 

88. The method of claim 79, wherein said composition is encapsulated in a liposome 
20 preparation. 

89. The method of claim 79, wherein said subject is a mammal. 

90. The method of claim 89, wherein said mammal is a human. 

25 

91 . A method of generating an immune response in a subject, comprising: 
providing an expression cassette of any of claims 1 to 46, 

expressing said polypeptide in a suitable host cell, 
isolating said polypeptide, and 
30 administering said polypeptide to the subject in an amount sufficient to elicit an 

immune response. 
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92. A method of generating an immune response in a subject, comprising 
introducing into cells of said subject an expression cassette of any one of claims 1 to 

46, under conditions that permit the expression of said polynucleotide and production of said 
polypeptide, thereby eliciting an immunological response to said polypeptide. 

93. The method of claim 92, where the method further comprises co-administration 
of an HIV polypeptide. 

94. The method of claim 93, wherein co-administration of the polypeptide to the 
subject is carried out before introducing said expression cassette. 

95. The method of claim 93, wherein co-administration of the polypeptide to the 
subject is carried out concurrently with introducing said expression cassette. 

96. The method of claim 93, wfierein co-administration of the polypeptide to the 
subject is carried out after introducing said expression cassette. 

97. The expression cassette of claim 59, wherein the viral polypeptide or antigen is 
selected from the group consisting of polypeptides derived from hepatitis B, hepatitis C and 
combinations thereof. 
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Ck£jVF110965JBW w m<xI 

atgggcgcccgcgccacx^tcctgcgc^^ 
tgcgcco:ggcggc^gaagtgctacatgatgaagcacct^^ 

GGAGAAGTTCGCCCTGAACCCCGGCCTGCTGGAGACCAGOT 

CGCCAGCTGCACCCCGCCCTGC^^ 

TGGCCACCCTGTACTGCGTG^ 

CAAGATCGAGGAGGAGCAGAACAAGTGCCAGCAGAAGATC 

MGGGCAAGGTGAGCX»GAACTACCC^ 

AGGCX&TCAGCCCCCGCACXICTGAAC^ 

CWCXmGGTGATCCCCATGTTCACC^ 

ACGATGTTGAACACCGTGGGCGGCCACCA^ 

ACGAGGAGGCCXSCCGAGTGGGACCGCGTGCA^ 

CCAGATG(X3CGAGCCCCG<KGCAGC^^ 

ATCGCCTGGATGACCAGCAACO^^ 

TCCTGGGCCTGAAC^GATCGTGCGGATGTA 

GGGCCCCAAGGAGCCCTTC^ 

CAGAGCACCC^GGAGGTGAAGAACTGGAT 

CCSACTGCAAGACCATCC^^ 

CG<XlX5CCAGG<XGTGGGCGGCCa^ 

CAGG<XAACACCAGCGTGATGATGCAGAAGAGGAACTTCAA» 

AGTGCTTCAACTGCGGCAAGGAGGGCCtt^ 

GGGCTGCTGGAAGTGGGGC2VAGGAGGGC^CCAGA 

AAC^CCTGGGCAAGATCttM^ 

GCCCCGAGCCCACa3CCXXX:C^ 

GAAGCAG^GAGCAAGGACCGCGAGACCCT^ 

CCCCTGAGCCAGTAA 
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Gag>iril0S>67jBW_mod • 

atgggcgcccgcgcc&gcatcctgcgcggcgagaagctggacaagtgggagaagatccgcc 
tgcgccccggcgggjuvgaagcactacatgctgaagcacctggtgtgggccagccgcgagct 
ggagggcttcgccctgaaccccggcctgctggagacc^kxgaggggtgcaagcaga!ls3 

aagcagctgcagcccgccctgcagaccggcaccgaggagctgcgcagcctgtacaacaccg 

tggccaccctgtactgcgtgcacgccggcatcgaggtccxscgacaccyuvggaggccctgga 

caagatcgaggaggagcagaacaagtcccagcagaagacccagcaggcovaggaggcggac 

ggcaaggtgagccagaactaccccatcgtgcagaacctgcagggccagatggtgcaccagg 

ccatcagcccccgcagcctgaacgcctgggtgaaggtgatcgaggagaaggccttcagccc 

cgaggtgatccccatgttcaccgccctgagcgagggcgccaccccccaggacctgaacacg 

atgtrgaacaccgtgggcggccaccaggccggcatgcagatgctgaaggacaccatcaacg 

aggaggccgccgagtgggaccgc<:tgcaccccgtgcaggccggccccgtggccccgggcca 

gatgcgcgacccccgcggcagcgacatcgccggcgccaccagcaccctggaggagcagatc 

gcctggatgaccagcaacccccccgtgcccgtgggcgacatctacaagcggtggatcatcc 

tgggcctgaacaagatcgtgcggatgtacaggcccgtgagcatcctggacatccgccaggg 

ccccaaggagccctrccgcgac'tacgtggaccgcttcttcaagaccctgcgcgccgagcag 

gccacccaggacgtgaagaactggatgaccgagaccctgctggtgcagaacgccaaccccg 

actgcflagaccatcctgcgcgctctcggccccggcgccaccctggaggagatgatgaccgc 

ctggcagggcgtgggcggccgcggccacaaggcccgcgtgctggccgaggcgatgagccag 

gccaacagcgtgaacatcatgatgcagaagagcaacttcaagggcccccggcgcaacgtca 

agtgcttcaactgcggcaaggagggccacatcgcc^ 

gggctgctggaagtgcggcmggagx3gccac<^gatgaaggactgcaccgatc 

aacttcctgggcaagatctggcxx&gccacaaggg^ 

gcagcgagcccgcc^cccccaco^ca^ccgcxxcc 

ggagaccacccccgcccccaagcaggagcccaaggaccgcgagccctaccgcxsagcqxtg 
accgccctgggcagcctgttcggcagcggccccctgagccagtaa 
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Fig. 3 



Env_AFl 10968_C_BW_op t 

> sicmal peptide (1-81) 

atgcgcgtgatgggcatcctgaagaactaccagcagtggtggatgtggggcatcctgggcttctggatgctgatca 
tcagcac^gtg^Mg^^ 

gttctgcaccagcgacgccaaggcctacgagaccgaggtgpacaacgtgtgggccacccacgcctgcgtgcccacc 

gaccccaacccccaggagatcgtgctggagaacgtgaccgagaacttcaaovtgtggaagaacgacatggtggacc 

agatgcacgaggacatcatcagcctgtgggaccagagcctgaagccctgcgtgaagctgacccccctgtgcgtgac 

cctgaagtgccgcaacgtgaacgccaccaacaacatcaacagcatgatcgacaacagcaacaagg 

aactgcagcttcaacgtgaccaccgagctgcgcgaccgcaagcaggaggtgcacgccctgttctaccgcctggacg 

tggtgcccctgcagggcaacaacagcaacgagtaccgcctgatcaactgcaacaccagcgccatcacccaggcctg 

ccccaaggtgagcttcgaccccatccccatccactactgcacccccgccggctacgccatcctgaagtgcaacaac 

cagaccttcaacggcaccggcccctgcaacaacgtgagcagcgtgcagtgcgcccacggcatcaagcccgtggtga 

gcacccagctgctgctgaacggcagcctggccaagggcgagatcatcatccgcagcgagaacctggccaacaacgc 

caagatcatcatcgtgcagctgaacaagcccgtgaagatcgtgtgcgtgcgccccaacaacaacacccgcaagagc 

gtgcgcatcggccccggccagaccttctacgccaccggcgagatcatcggcgacatccgccaggcctactgcatca 

tcaacaagaccgagtggaacagcaccctgcagggcgtgagcaagaagctggaggagcacttcagcaagaaggccat 

caagttcgagcccagcagcggcggcgacctggagatcaccacccacagcttcaactgccgcggcgagttcttctac 

tgcgacaccagccagctgttcaacagcacctacagccccagcttcaacggcaccgagaacaagctgaacggcacca 

tcaccatcacctgccgcatcaagcagatcatcaacatgtggcagaaggtgggccgcgccatgtacgccccccccat 

cgccggcaacctgacctgcgagagcaacatcaccggcctgctgctgacccgcgacggcggcaagaccggccccaac 

gacaccgagatcttccgccccggcggcggcgacatgcgcgacaactggcgcaacgagctgtacaagtacaaggtgg 

opl20(lSi2)< — \/~><i5i3>gp41 
TGGAGATCAAGCCCCTGGGCGTGGCCCCCACCGAGGCCAAGCGCCGCGTGGTGGAGCGCGAGAAGCGCGCCGTGGG 

CATCGGCGCCGTGTTCCTGGGCTTCCTGGGCGCCGCCGGCAGCACCATGGGCGCCGCCAGCATCACCCTGACCGTG 

CAGGCCCGCCTGCTGCTGAGCGGCATCGTGCAGCAGCAGAACAACCTGCTGCGCGCCATCGAGGCCCAGCAGCACC 

TGCTGCAGCTGACCGTGTGGGGCATCAAGCAGCTGCAGACCCGCATCCTGGCCGTGGAGCGCTACCTGAAGGACCA 

GCAGCTGCTGGGCATCTGGGGCTGCAGCGGCAAGCTGATCTGCACCACCGCCGTGCCCTGGAACAGCAGCTGGAGC 

AACCGCAGeCACGACGAGATCTGGGACAACATGACCTGGATGCAGTGGGACCGCGAGATCAACAACTACACCGACA 

CCATCTACCGCCTGCTGGAGGAGAGCCAGAACCAGCAGGAGAAGAACGAGAAGGACCTGCTGGCCCTGGACAGCTG 

gpl40(2025>< — \/ 

GCAGAACCTGTGGAACTGGTTCAGCATCACCAACTGGCTGTGGTACATCAAGATCTTCATCATGATCGTGGGCGGC 

CTGATCGGCCTGCGCATCATCTTCGCCGTGCTGAGCATCGTGAACCGCGTGCGCCAGGGCTACAGCCCCCTGCCCT 

TCCAGACCCTGACCCCCAACCCCCGCGAGCCCGACCGCCTGGGCCGCATCGAGGAGGAGGGCGGCGAGCAGGACCG 

CGGCCGCAGCATCCGCCTGGTGAGCGGCTTCCTGGCCCTGGCCTGGGACGACCTGCGCAGCCTGTGCCTGTTCAGC 

TACC^CCGCCTGCGCGACTTCATCCTGATCGCCGCCCGCGTGCrrGGAGCTGCraSGCC^ 

TGAAGTACCTGGGCAGCCTGGTGCAGTACTGGGGCCTGGAGCTC^ 

CGCCATCGCCGTGGCCGAGGGCACCGACCGCATCATCGAGTTCATCCA^ 

gpl60, gp41(2547>< — \ 
CCCCGCCGCATCCGCCAGGGCTTCGAGGCCGCCCTGCAGTAA 
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Big. 4 

Env_AF110975_C_BW_opt ^ / __ > 
q§ctgggcargctgtg§gtgagcgtgt^ 

CAGCGACGCCAAGGCCTACGAGAAGGAGGTGCACAACGTGTGGGCCACCCACGCCTGCGTGCCCACCGACCCCAAC 
CCCCAGGAGATCGAGCTGGACAACGTGACCGAGAACTTCAACATGTGGAAGAACGACATGGTGGACCAGATGCACG ■ ^ 
AGGACATCATCAGCCTGTGGGACCAGAGCCTGAAGCCCCGCGTGAAGCTGACCCCCCTGTGCGTGACCCTGAAGTG 
CACCAACTACAGCACCAACTACAGCAACACCATGAACGCCACCAGCTACAACAACAACACCACCGAGGAGATCAAG 

aactgcaccttcaacatgaccaccgagctgcgcgacaagaagcagcaggtgtacgccctgttctacaagctggaca 

TCGTGCCCCTGAACAGCAACAGCAGCGAGTACCGCCTGATCAACTGCAACACCAGCGCCATCACCCAGGCCTGCCC 

CAAGGTGAGCTTCGACCCCATCCCCATCCACTACTGCGCCCCCGCCGGCTACGCCATCCTGAAGTGCAAGAACAAC 

ACCAGCAACGGCACCGGCGCCTGCCAGAACGTGAGCACCGTGCAGTGCACCCACGGCATCAAGCCCGTGGTGAGCA 

CCCCCCTGCTGCTGAACGGCAGCCTGGCCGAGGGCGGCGAGATCATCATCCGCAGCAAGAACCTGAGCAACAACGC 

CTACACCATCATCGTGCACCTGAACGACAGCGTGGAGATCGTGTGCACCCGCCCCAACAACAACACCCGCAAGGGC 

ATCCGCATCGGCCCCGGCCAGACCTTCTACGCCACCGAGAACATCATCGGCGACATCCGCCAGGCCCACTGCAACA 

TCAGCGCCGGCGAGTGGAACAAGGCCGTGCAGCGCGTGAGCGCCAAGCTGCGCGAGCACTTCCCCAACAAGACCAT 

CGAGTTCCAGCCCAGCAGCGGCGGCGACCTGGAGATCACCACCCACAGCTTCAACTGCCGCGGCGAGTTCTTCTAC 

TGCAACACCAGCAAGCTGTTCAACAGCAGCTACAACGGCACCAGCTACCGCGGCACCGAGAGCAACAGCAGCATCA 

TCACCCTGCCCTGCCGCATCAAGCAGATCATCGACATGTGGCAGAAGGTGGGCCGCGCCATCTACGCCCCCCCCAT 

CGAGGGCAACATCACCTGCAGCAGCAGCATCACCGGCCTGCTGCTGGCCCGCGACGGCGGCCTGGACAACATCACC 

ftCCGAGATCTTCCGCCCCCAGGGCGGCGACATGAAGGACAACTGGCGCAACGAGCTGTA^ 

AGATCAAGCCCCTGGGCGTGGCCCCCACCGAGGCCAAGCGCCGCGTGGTGGAGCGCGAGAAGCGCGCCGTGGGCAT 
CGGCGCCGTGATCTTCGGCTTCCTGGGCGCCGCCGGCAGCAACATGGGCGCCGCCAGCATCACCCTGACCGCCCAG 
GCCCGCCAGCTGCTGAGCGGCATCGTGCAGCAGCAGAGCAACCTGCTGCGCGCCATCGAGGCCCAGCAGCACATGC 
TGCAGCTGACCGTGTGGGGCATCAAGCAGCTGCAGGCCCGCGTGCTGGCCATCGAGCGCTACCTGAAGGACCAGCA 
GCTGCTGGGCATCTGGGGCTGCAGCGGCAAGCTGATCTGCACCACCACCGTGCCCTGGAACAGCAGCTGGAGCAAC 
AAGACCCAGGGCGAGATCTGGGAGAACATGACCTGGATGCAGTGGGACAAGGAGATCAGCAACTACACCGGCATCA 



.GAACGAGAAGGACCTGCTGGCCCTGGACAGCCGCAA 
^ACATCAAGATCTTCATCATGATCGTGGGCGGCCTG 
CCTGCGCATCATCTTCGCCGTGCTGAGCATCGTGAACCGCGTGCGCCAGGGCTACAGCCWCTGAGCTTCC 



TCTACCGCCTGCTGGAGGAGAGCCAGAACCAGCAG^tf^CA^^ ^ 

CAACCTGTGGAGCTGGTTCAACATCAGCAACTGGCTGTGGT' 



AGACCCTGACCCCCAACCCCCGCGGCCTGGACCGCCTGGGCCGCATCGAGGAGGAGGGCGGCGAGCAGGACCGCGA 
CCGCAGCATCCGCCTGGTGCAGGGCTTCCTGGCCCTGGCCTGGGACGACCTGCGCAGCCTGTGCCTGTTCAGCTAC 
CACCGCCTGCGCGACCTGATGCTGGTGACCGCCCGCGTGGTGGAGCTGpTGGGCCGCAGCAGCCCCCGCGGCCTGC 
AGCGCGGCTGGGAGGCCCTGAAGTACCTGGGCAGCCTGGTGCAGTACTGGGGCCTGGAGCTGAAGAAGAGCGCCAC 

CAGCCTGCXGGACAGCA^ 
CGCGCCTTCTGCAACATCC 
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RGGCCGCGGACAAGGGCAAGGTGAG' 



cccccgcaccctgarcocc^gtgaaggtgatcgagg; 



AGAAGGCCTTCAGCCCCGAGGT 



CAGGCCATCAG' 

.CCCCCGGCCAGAAGCAGGAGAGCAAGG 



•fi jure 5 
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ATCGGCGOCXfcCGCCACWA^^ 

CGGCAAGAAGCACTACATGCTGAAGCACCS^GTGTGGG^ 

CCGGCCTGCT^GACCGCCGAGGTCTC 

ACCGAGGAGCTGMCAGCCTGTACAAC^C^^ 

CGACACCAAGGAGGCCCTGGACMGATCGAGKSAGGAGCAGAAC^ 

AGGAGGCCGACGGCAAGGTGAGCCkGAACTACCCCATCGTGCAG 

GCCATCAGCCCCCGCACCCn?GAACGCCTGGGTGAAGGTGATCGAGGAGAAGGCCTTCAGCCCCGAGGTGftT 



CCCCATGTTCACOjCCCTGAGCGAGGGCGCCACCCCCCAGGACCTGAACAC C&TGC rGAACACCGTGGGOT 

G 5 



GCCACCAGGCCGCCATGCAGATGCTG^ 

CCCGTGCAGGCCGGCCCCGTGGCCCCCQGCC^GATGCGCGACCCCCGCGGCAGCGACATCGCCGGCGCCAC 

CAGCAGGGTGCAGGAGCAGATCGCCTGGATGACCAGCAACCCCCCCGTGCCCGTGGGCGACATCTACAAGC 

<||fGGATCATCCTGGGCCTGAAC^ 

GGCCCCAAGGAGCCCTTCCGCGACTACGTGGA 

. GGACGTGAAGAACTGGATGACCGAGACCCTGCTGG^GCAGAACGCCAACCCCGACTGCAAGACCA^CCTGC 



AAGGCCC^GTGCTCGCC<^GG<^ 
CAAG<^CCCCC<j|^<^^ 
CCXJCCCGCAAGAAGGGCTGCTGGAAGTGC^^ 
GCCAACTTCCTGGGCAAGATCTGGCCCAGCGACAA 
GCCCGCCGCCCCXACCG^CCAC^ 
CfcAAGCAGGAGCCCAAGGACX^CGAGCCC^^ 
GGCCCCCTGAGCCAGTAA 
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5' Sal 1-site 




Prot restriction- sites 

... .SVEcoR l-sitc 



^-insertion mut. at slippery scqu. 
(tt tttta~>ttttttTa) 
(shown for native sequence) 



mut cat. center YMDD~>AP 
dbmut primer grip \VMGY->PI 



FIGURE 7 




YMDD epitope 

cassette » 

additional genes/ 
cassettes - MCS 
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PR975(+) (SEQ ID NO:30) 



ACO 



G ATCG AG ATCTGCGGCAAG AAGGCC ATCG^ ^ rTTTfr A ACTTCCCC AT 

GAACATCATCGGCCC 
CAGCCCCATCGAGA( 
TGAAGCAGTGGCCC( 
GAGATGGAGAAGGA 
CCCCGTGTTCGCCAl^^^ 



gXacatcatc^^ 

CAGCCCCATCGAGACCGTGCCCGTGAA 
TGAAGCAGTGGCCCCTGACCGAGGAG/ 
GAGATGGAGAAGGAG^CAAGAT^ 



JGGCCAGCCAGATCTACCCCC " 
jCGCCAAGGCCCTGACCGACi 
3CCGAGAACCGCGAGATCCT" 

:aaggacctggtggccgaga 
rctaccaggagcccttcaag; 
3cccacaccaacgacgtgaa 
3agcatcgtgatctggggca 
ctgggagacctggtggaccg 

CGTGAACACCCCCCCCCTGG 
CGGCGCCGAGACCTTCTACG 



GCCGAGAACCGCGAGATCCTG( 
CAAGGACCTGGTGGCCGAGAT 
TCTACCAGGAGCCCTTCAAGAi 
GCCCACACCAACGACGTGAAG 
GAGCATCGTGATCTGGGGCAA 
CTGGGAGACCTGGTGGACCGA 



TCTACCAGGAGCCCTTCAAGAACC" 
GCCCACACCAACGACGTGAAGCAC 
GAGCATCGTGATCTC3GG<3CAA^^^ 
CIGGGAGACCTCKnGGAOTACTACT^ 



AGGATCGA 



.TTAAAAGCTTCCCGGGGCTAGCACCGGTGAATTC 
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PR975YM fSEO H) NO:31) 

^GACGCC^CATG<3CCGAG<KXATGAGCCAG<3CCACCAGCGCCAACATCCTGAT 

Sagogcagcaao^g^ 

GGAGGGCCACATCGCCCX^ 

GCGGC^GGAGC^ACCAGATGAAGGACT(3CACCGAGCGCCAGGCCAACIT 
CG^G^G^ACCTGGCC^TCCCCCAGGGCAAGGCCCGGGAGTTCCCCAGCGAGCAGAA 
CCGCGCCAACAGCCCCACCAGCCGCGAGCTGCAGGTGCGCGGCGACAACCCCCGCA 
GCGAG^)CGGCGCCGAGCGCCAGGGCACCCTGAACTTCCC(XAGATCACCCT 

ACCGGCGCCGACGACACCGTGCTGGAGGAGATGAGCCT^ 

GATCGAGATCTGCGGCAAGAAGGCCATCGGCACCGTGCTGATCGGCCCCACCCCCGT 
GAACATGATCG<3CCGCAACATGCTGACCCAG^ 

CAGCCOCATCGAGACCGTGCCCGTGAAGCTGAAGCCCGGCATGGACGGCCCCAAGG 
TGAAGCAGTGGCCCCTGACCGAGGAGAAGATCAAGGCCCTGACCGCCATCTGCGAG 

gaga?ggagaaggaW:aagatcacc 

accccgccggcctgaagaagaagaagagcgtgaccgtgctggacgtgggcgacgcc 

SjGcXcC^OTGCACCCCGAC 

aggagagctgga^ 

AGCCAGATC1ACCCCGGCATCAAGGTGCGCCAGCTGTGCAAGCTGCTGC 

Ia^cctgaccgacatcgtgcccctgaccgaggaggccgagctggagctggcc^ 
g!accgcgagatcctgggcgagccggtck;acggcgtgtact^^ 

AGACCTGGTGGACCGACTACTGGCAG<K:CACCTGGATCCCCGAGTGGGAGTrCGTGA 
A^ACGCCCCCOTGGTGAAGCTGTGGTACCAGCTGGAGAA^ 

ccgagacotctacgtggacggcgccgccaacc^^ 

g^aot^ccgIccggggccck^^ 

Saagacc^agctgcagggcatccagct 

^GAACATCGTGTGbGACAGGCAGTACGCCCTGG^ 

AGAGCGAC^G^jAGCTGGTGAACCAGATCATCGAGCAGCTGATCAAGAAGGAGAAG 

gtg^cct^^gctgggtgcccgcx^cacaagggcatcggcggcaacgagcagatcga 

CAAG^GOTGATC^ 
^TCGTCATCTACCAGTACAT^^ 

tcgattaaaagcttcccggggctagcaccggtgaattc 
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PR975YMWM (SEQIDNO:32) 



GTCGACGCCACCAP 



GGCCGAGGCCATGAGCCAGGCCACCAGCGCCAACATCCTGAT 



CCGCGCCAACA(X^^^ 

~ XA 
:AA 

:gc 

JGC 
KAC 
VTG 

tcaagcagtggcccctgaccgaggIga^ 



" 3CTGGAGGAGATGAGCCTGCCCGGCAAGTGGAAGC( 
jCGGCTTCATCAAGGTGCGCCAGTACGACCAGATCC 
\GGCCATCGGCACCGTGCTGATCGGCCCCACCCCCG 
^V;^7^ATr^rnrAACATGCTGACCCAGCTGGGCTGCACCCTGAACTTCCCCAT 

2a^g?cSa2c^ 



CGCGAGKjACCTGGCCTTCCCCCAGGGCAAGGCCCGCGAGT^^ 

XGCGCCAACAGCCCCACC 
3CGAGGCCGGCGCCGAGCC 
\GCGCCCCCTGGTGAGCAT 
ACCGGCGCCGACGACACCC 
CAAGATGATCGGCGGCATC 
GATCGAGATCTGCGGCAAC 
GAACATCATCGGCCGCAAC 
CAGCGCCATCGAGACCGTC 



CGAGATCCTGCGCGAGCCCGTGCACG<XGTGTACTACGACCCCA^ 

3CAGGGH 
lGACCGG 
^CCGAGK 



GGCCGAGATCCAGAAGCAGGGCCACGACCAGTGKJACCTACCAGATCT 

ct!*jgc<;a^ 



CCCCCCTGGTGAAGCTGTGGTACCAGCTra^^ 

accttctacgtg 

CTGGTGAGCAAGGGK^TCGGCAAGGTpCTG^ 



AAAAGCTTCCCGGGGCTAGCACCGGTGAATTC 



FIGURE 10 



WO 02/04493 



11/114 



PCT7US01/21241 



8 5 ZA (SEQ ID NO: 33) 



X TGGAAGGGTT AATTTACTCC AAGAAAAGGC -^^SS^c'gSSSc 
^ n.PAA^rTT CTTCCCTGAT TGGCAAAACT ACACACCGGG GCCAGGGGTC AbAlAiLLAL 

S Ssss S5SS ssss s= 

L 5 ™ =SS SSSS S=£ SEES EE 

J« JSSgo cgttccggga ggtgtggtct gggcgggact tgggagtggt caaccctcag 

961 GTAAACAAAT ^^^^^^^^AA SgATAGAG GTACGAGACA 
cSSSS^ SS^S SoiXUKn CAGTGGGGGG RCATCRROCA GCCMSC£A 

™ = s=s ssss =i « 

S =S£ S= = 5SS — 

tSgS^ ggacataaaa caagggccaa aagaaccctt tagagactat gtagaccggt 
SSIIIc cttaagagct gaacaagcta cacaagatgt aaagaattgg atgacagaca 

I'll SSSSS CCAAAATGCG AACCCAGATT GTAAGACCAT TTTAAGAGCA TTAGGACCAG 

SctStt agaagaaatg atoacagcat gtcagggagt gggaggacct agcpataaag 
T 1 ^SStott ggctgaggca atgagccaag caaacagtaa catactagtg cagagaagca 

US A^SaSS ScSaSS ATTATTAAAT GTTTCAACTG TCBGCAAAGTA JGGCACATAG 

llll cSSaattg cagggcccct aggaaaaagg gctgttggaa atgtggacag gaaggacacc 

Ss sssss ssss ssss sssx sss 

i i ^cttc^ot S!S 

=S SS= SEES ESSE ESSES ESSES 
5s SSSS =? SEES = . 

1 ™™**e enaom* e»»n 

*« mfM — 7vr«nr«r7vajvT^r TT PC a ATT AG TCCTATTGAA ACTGTACGA^ iAAftAHAftA 

Si acSgaS gSScS SgSaIIcI Itggccax^ acagaagaaa aaataaaagc 

ATTTGTGAGG AAATGGAGAA GGAAGGAAAA ATTACAAAAA TTGGGCCTGA 
"E AaS^CCAG SS^CAT AAAAAAGAAG GACAGTACTA 

"2 attagtagat ttcagggaac tcaataaaag aactcaagac ttttgggaag ttcaatxagg 
llll aSacScac ccagcaggat taaaaaagaa aaaatcagtg acagxgctag atgtggggga 
IHI SSttt tcagttcctt tagatgaaag cttcaggaaa tatactgcat tcaccatacc 
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2941 TAGTATAAAC AATGAAACAC CAGGGATTAG ATATCAATAT AATGTGCTGC CACAGGGATG 
3001 GAAAGGATCA CCAGCAATAT TCCAGAGTAG CATGACAAAA ATCTTAGAGC CCTTCAGAGC 
3061 AAAAAATCCA GACATAGTTA TCTATCAATA TATGGATGAC TTGTATGTAG GATCTGACTT 
3121 AGAAATAGGG CAACATAGAG CAAAAATAGA AGAGTTAAGG GAACATTTAT TGAAATGGGG 
3181 ATTTACAACA CCAGACAAGA AACATCAAAA AGAACCCCCA TTTCTTTGGA TGGGGTATGA 
3241 ACTCCATCCT GACAAATGGA CAGTACAACC TATACTGCTG CCAGAAAAGG ATAGTJGGAC 
3301 TGTCAATGAT ATACAGAAGT TAGTGGGAAA ATTAAACTGG GCAAGTCAGA TTTACCCAGG 
3361 GATTAAAGTA AGGCAACTCT GTAAACTCCT CAGGGGGGCC AAAGCACTAA CAGACATAGT 
3421 ACCACTAACT GAAGAAGCAG AATTAGAATT GGCAGAGAAC AGGGAAATTT TAAGAGAACC 
3481 AGTACATGGA GTATATTATG ATCCATCAAA AGACTTGATA GCTGAAATAC . AGAAACAGGG 
3541 GCATGAACAA TGGACATATC AAATTTATCA AGAACCATTT AAAAATCTGA AAACAGGGAA 
3601 GTATGCAAAA ATGAGGACTA CCCACACTAA XGATGTAAAA CAGTTAACAG AGGCAGTGCA 
3661 AAAAATAGCC ATGGAAAGCA TAGTAATATG GGGAAAGACT CCTAAATTTA GACTACCCAT 
3721 CCAAAAAGAA ACATGGGAGA CATGGTGGAC AGACTATTGG CAAGCCACCT GGATCCCTGA 
3 781 GTGGGAGTTT GTTAATACCC CTCCCCTAGT AAAATTATGG TACCAACTAG AAAAAGATCC 
3 841 CATAGCAGGA GTAGAAACTT TCTATGTAGA TGGAGCAACT AATAGGGAAG CTAAAATAGG 
3901 AAAAGCAGGG TATGTTACTG ACAGAGGAAG G C AG AAAATT GTTACTCTAA CTAACACAAC 
3961 AAATCAGAAG ACTGAGTTAC AAGCAATTCA GCTAGCTCTG CAGGATTCAG GATCAGAAGT 
4021 AAACATAGTA ACAGACTCAC AGTATGCATT AGGAATCATT CAAGCACAAC CAGATAAGAG 
4081 TGACTCAGAG ATATTTAACC AAATAATAGA ACAGTTAATA AACAAGGAAA GAATCTACCT 
4141 GTCATGGGTA CCAGCACATA AAGGAATTGG GGGAAATGAA CAAGTAGATA AATTAGTAAG 
4201 TAAGGGAATT AGGAAAGTGT TGTTTCTAGA TGGAATAGAT AAAGCTCAAG AAGAGCATGA 
4261 AAGGTACCAC AGCAATTGGA GAGCAATGGC TAATGAGTTT AATCTGCCAC CCATAGTAGC 
4321 AAAAGAAATA GTAGCTAGCT GTGATAAATG TCAGCTAAAA GGGGAAGCCA TACATGGACA 
4381 AGTCGACTGT AGTCCAGGGA TATGGCAATT AGATTGTACC CATTTAGAGG GAAAAATCAT 
4441 CCTGGTAGCA GTCCATGTAG CTAGTGGCTA CATGGAAGCA GAGGTTATCC CAGCAGAAAC 
4501 AGGACAAGAA ACAGCATATT TTATATTAAA ATTAGCAGGA AGATGGCCAG TCAAAGTAAT 
4561 ACATACAGAC AATGGCAGTA ATTTTACCAG TACTGCAGTT AAGGCAGCCT GTTGGTGGGC 
4621 AGGTATCCAA CAGGAATTTG GAATTCCCTA CAATCCCCAA AGTCAGGGAG TGGTAGAATC 
4681 CATGAATAAA GAATTAAAGA AAATAATAGG ACAAGTAAGA GATCAAGCTG AGCACCTTAA 
4741 GACAGCAGTA CAAATGGCAG TATTCATTCA CAATTTTAAA AGAAAAGGGG GAATTGGGGG 
4801 GTACAGTGCA GGGGAAAGAA TAATAGACAT AATAGCAACA GACATACAAA CTAAAGAATT 
4861 ACAAAAACAA ATTATAAGAA TTCAAAATTT TCGGGTTTAT TACAGAGACA GCAGAGACCC 
4921 TATTTGGAAA GGACCAGCCG AACTACTCTG GAAAGGTGAA GGGGTAGTAG TAATAGAAGA 
4981 TAAAGGTGAC ATAAAGGTAG TACCAAGGAG GAAAGCAAAA ATCATTAGAG ATTATGGAAA 
5041 ACAGATGGCA GGTGCTGATT GTGTGGCAGG TGGACAGGAT GAAGATTAGA GCATGGAATA 
5101 GTTTAGTAAA GCACCATATG TATATATCAA GGAGAGCTAG TGGATGGGTC TACAGACATC 
5161 ATTTTGAAAG CAGACATCCA AAAGTAAGTT CAGAAGTACA TATCCCATTA GGGGATGCTA 
5221 GATTAGTAAT AAAAACATAT TGGGGTTTGC AGACAGGAGA AAGAGATTGG CATTTGGGTC 
5281 ATGGAGTCTC CATAGAATGG AGACTGAGAG AATACAGCAC ACAAGTAGAC CCTGACCTGG 
5341 CAGACCAGCT AATtCACATG CATTATTTTG ATTGTTTTAC AGAATCTGCC ATAAJ3ACAAG 
5401 CCATATTAGG ACACATAGTT TTTCCTAGGT GTGACTATCA AGCAGGACAT AAGAAGGTAG 
5461 GATCTCTGCA ATACTTGGCA CTGACAGCAT TGATAAAACC AAAAAAGAGA AAGCCACCTC 
5521 TGCCTAGTGT TAGAAAATTA GTAGAGGATA GATGGAACGA CCCCCAGAAG ACCAGGGGCC 
5581 GCAGAGGGAA CCATACAATG AATGGACACT AGAGATTCTA GAAGAACTCA AGCAGGAAGC 
5641 TGTCAGACAC TTTCCTAGAC CATGGCTCCA TAGCTTAGGA CAATATATCT ATGAAACCTA 
5701 TGGGGATACT TGGACGGGAG TTGAAGCTAT AATAAGAGTA CTGCAACAAC TACTGTTCAT 
5761 TCATTTCAGA ATTGGATGCC AACATAGCAG AATAGGCATC TTGCGACAGA GAAGAGCAAG 
5821 AAATGGAGCC AGTAGATCCT AAACTAAAGC CCTGGAACCA TCCAGGAAGC CAACCTAAAA 
5881 CAGCTTGTAA TAATTGCTTT TGCAAACACT GTAGCTATCA TTGTCTAGTT TGCTTTCAGA 
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5941 CAAAAGGTTT AGGCATTTCC TATGGCAGGA AGAAGCGGAG ACAGCGACGA AGCGCTCGTC 
6001 CAAGTGGTGA AGATCATCAA AATCCTCTAT CAAAGCAGTA AGTACACATA GTAGATGTAA 
6061 TGGTAAGTTT AAGTTTATTT AAAGGAGTAG ATTATAGATT AGGAGTAGGA GCATTGATAG 
6121 TAGCACTAAT CATAGCAATA ATAGTGTGGA CCATAGCATA TATAGAATAT AGGAAATTGG 
6181 TAAGACAAAA GAAAATAGAC TGGTTAATTA AAAGAATTAG GGAAAGAGCA GAAGACAGTG 
6241 GCAATGAGAG TGATGGGGAC ACAGAAGAAT TGTCAACAAT GGTGGATATG GGGCATCTTA 
6301 GGCTTCTGGA TGCTAATGAT TTGTAACACG GAGGACTTGT GGGTCACAGT CTACTATGGG 
6361 GTACCTGTGT GGAGAGAAGC AAAAACTACT CTATTCTGTG CATCAGATGC TAAAGCATAT 
6421 GAGACAGAAG TGCATAATGT CTGGGCTACA CATGCTTGTG TACCCACAGA CCCCAACCCA 
6481 CAAGAAATAG TTTTGGGAAA TGTAACAGAA AATTTTAATA TGTGGAAAAA TAACATGGCA 
6541 GATCAGATGC ATGAGGATAT AATCAGTTTA TGGGATCAAA GCCTAAAGCC VATGTGTAAAG 
6601 TTGACCCCAC TCTGTGTCAC TTTAAACTGT ACAGATACAA ATGTTACAGG TAATAGAACT 
6661 GTTACAGGTA ATACAAATGA TACCAATATT GCAAATGCTA CATATAAGTA TGAAGAAATG 
6721 AAAAATTGCT CTTTCAATGC AACCACAGAA TTAAGAGATA AGAAACATAA AGAGTATGCA 
6781 CTCTTTTATA AACTTGATAT AGTACCACTT AATGAAAATA GTAACAACTT TACATATAGA 
6841 TTAATAAATT GCAATACCTC AACCATAACA CAAGCCTGTC CAAAGGTCTC TTTTGACCCG 
6901 ATTCCTATAC ATTACTGTGC TCCAGCTGAT TATGCGATTC TAAAGTGTAA TAATAAGACA 
6961 TTCAATGGGA CAGGACCATG TTATAATGTC AGCACAGTAC AATGTACACA TGGAATTAAG 
7021 CCAGTGGTAT CAACTCAACT ACTGTTAAAT GGTAGTCTAG CAGAAGAAGG GATAATAATT 
7081 AGATCTGAAA ATTTGACAGA GAATACCAAA ACAATAATAG TACATCTTAA TGAATCTGTA 
7141 GAGATTAATT GTACAAGGCC CAACAATAAT ACAAGGAAAA GTGTAAGGAT AGGACCAGGA 
7201 CAAGCATTCT ATGCAACAAA TGACGTAATA GGAAACATAA GACAAGCACA TTGTAACATT 
7261 AGTACAGATA GATGGAATAA AACTTTACAA CAGGTAATGA AAAAATTAGG AGAGCATTTC 
7321 CCTAATAAAA CAATAAAATT TGAACCACAT GCAGGAGGGG ATCTAGAAAT TACAATGCAT 
7381 AGCTTTAATT GTAGAGGAGA ATTTTTCTAT TGCAATACAT CAAACCTGTT TAATAGTACA 
7441 TACTACCCTA AGAATGGTAC ATACAAATAC AATGGTAATT CAAGCTTACC CATCACACTC 
7501 CAATGCAAAA TAAAACAAAT TGTACGCATG TGGCAAGGGG TAGGACAAGC AATGTATGCC 
7561 CCTCCCATTG CAGGAAACAT AACATGTAGA TCAAACATCA CAGGAATACT ATTGACACGT 
7621 GATGGGGGAT TTAACAACAC AAACAACGAC ACAGAGGAGA CATTCAGACC TGGAGGAGGA 
7681 GATATGAGGG ATAACTGGAG AAGTGAATTA TATAAATATA AAGTGGTAGA AATTAAGCCA 
7741 TTGGGAATAG CACCCACTAA GGCAAAAAGA AGAGTGGTGC AGAGAAAAAA AAGAGCAGTG 
7801 GGAATAGGAG CTGTGTTCCT TGGGTTCTTG GGAGCAGCAG GAAGCACTAT GGGCGCAGCG 
7861 TCAATAACGC TGACGGTACA GGCCAGACAA CTGTTGTCTG GTATAGTGCA ACAGCAAAGC 
7921 AATTTGCTGA AGGCTATAGA GG CGCAACAG CATATGTTGC AACTCACAGT CTGGGGCATT 
7981 AAGCAGCTCC AGGCGAGAGT CCTGGCTATA GAAAGATACC TAAAGGATCA ACAGCTCCTA 
8041 GGGATTTGGG GCTGCTCTGG AAGACTCATC TGCACCACTG CTGTGCCTTG GAACTCCAGT 
8101 TGGAGTAATA AATCTGAAGC AGATATTTGG GATAACATGA CTTCGATGCA GTGGGATAGA 
8161 GAAATTAATA ATTACACAGA AACAATATTC AGGTTGCTTG AAGACTCGCA AAACCAGCAG 
8221 GAAAAGAATG AAAAAGATTT ATTAGAATTG GACAAGTGGA ATAATCTGTG GAATTGGTTT 
8281 GACATATCAA ACTGGCTGTG GTATATAAAA ATATTCATAA TGATAGTAGG AGGCTTGATA 
8341 GGTTTAAGAA TAATTTTTGC TGTGCTCTCT ATAGTGAATA GAGTTAGGCA GGGATACTCA 
8401 CCTTTGTCAT TTCAGACCCT TACCCCAAGC CCGAGGGGAC TCGACAGGCT CGQA^GGAATC 
8461 GAAGAAGAAG GTGGAGAGCA AGACAGAGAC AGATCCATAC GATTGGTGAG CGGATTCTTG 
8521 TCGCTTGCCT GGGACGATCT GCGGAGCCTG TGCCTCTTCA GCTACCACCG CTTGAGAGAC 
8581 TTCATATTAA TTGCAGTGAG GGCAGTGGAA CTTCTGGGAC ACAGCAGTCT CAGGGGACTA. 
8641 CAGAGGGGGT GGGAGATCCT TAAGTATCTG GGAAGTCTTG TGCAGTATTG GGGTCTAGAG 
8701 CTAAAAAAGA GTGCTATTAG TCCGCTTGAT ACCATAGCAA TAGCAGTAGC TGAAGGAACA 
8761 GATAGGATTA TAGAATTGGT ACAAAGAATT TGTAGAGCTA TCCTCAACAT ACCTAGGAGA 
8821 ATAAGACAGG GCTTTGAAGC AGCTTTGCTA TAAAATGGGA GGCAAGTGGT CAAAACGCAG 
8881 CATAGTTGGA TGGCCTGCAG TAAGAGAAAG AATGAGAAGA ACTGAGCCAG CAGCAGAGGG 
8941 AGTAGGAGCA GCGTCTCAAG ACTTAGATAG ACATGGGGCA CTTACAAGCA GCAACACACC 
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9001 too™ «n« ■— « -™ SSSSS £SS£ 

9061 TCCAGTCAGA CCTCAGGTAC CTTTAAGACC AAAGGCAAGA 
9121 CTTCTTTTTA AAAGAAAAGG ^CTGGA ^GGTX^TT J RAAACTACAC 
9181 AATCCTTGAT TTGTGGGTCT ATAACACACA AGGCTTCTTC CCTQ TAGTACCAGT 
9241 ATCGGGGGCA GGGGTCCGAT TCCCACTGAC CTTTGGATGG ^CTTCAAGC ™ 

930i tgacccaagg gaggtgaaag aggccaatga aggagaagac aactgtttgc TA^ t 
9361 gagccaacat ogagcagagg atgaagatag agaagtatta ^^ aaaq actgctgaca 
9421 tctagcacac agacacatgg cccgcgagct aca^^ ccgggaggtg tggtctgggc 

9481 CAGAAGGGAC TTTCCGCCTG GGACTTTCCA CTGGGGCGTT ^g^™ J 
9541 GGGACTTGGG AGTGGTCACC ^AGATGCT GCATATAAGC ^^^^CCCA 
9601 GGGTCTCTCT CGGTAGACCA GATCTGAGCC ^GCTCT ™ CCATCTGTTG 
9661 XGTGACTCTG SSSS SSS ££S£ TAGTGTGGAA AATCTCTAGC 



9721 
9781 A 
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SEQ ID NO:34 

CACCAAATGAAAGACTGTACTGAGAGOCAGGCTAA 
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975Pol wt until 6aa Int: (SEQ ID NO-.35) 
TTTTITACKXhAAGATTTGGCCT 

CAGAACAGAGCCAACAGCCCCACCAGCAGAGAGCTTCAAGTTCGAGGAGACAACCC 

CCGCTCCGAAGCAGGAGCCGAAAGACAGGGAACCCnrTAATTTCCCTCAAATCACTCT 

TTGGCAGCGACCCCTTGTCTCAATAAAAGTAGGGGGTCAAATAAAGGAGGCTCTCTT 

AGACACAGGAGCTGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAAATGGA 

AACCAAAAATGATAGGAGGAATTGGAGGTrrTATCAAAGTAAGACAGTATGATCAA 

ATACTrATAGAAATTTGTGGAAAAAAGGCTATAGGTACAGTATTAATAGGACCTACA 

CCTGTCAACATAATTGGAAGGAATATGTTGACTCAGCTTGGATGCACACTAAATTTT 

CCAATTAGTCCCATTGAAACTGTGCCAGTAAAATTAAAGCCAGGAATGGATGGCCCA 

AAGGTTAAACAATGGCCATTGACAGAAGAGAAAATAAAA.GCATTAACAGCAATTTG 

TGAAGAAATGGAGAAAGAAGGAAAAATTACAAAAATTGGGCCTGAAAATCCATATA 

ACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAGTTAGTA 

GATITCAGGGAACTrAATAAAAGAACTCAAGACTTTTGGGAAGTTCAATTAGGAATA 

CCACACCCAGCAGGGTTAAAAAAGAAAAAATCAGTGACAGTACTGGATGTGGGGGA 

TGCATATTTTTCAGTTCCTITAGATGAGGACTTCAGGAAATATACTGCATTCACCATA 

CCTAGTATAAACAATGAAACACCAGGGATTAGATATCAATATAATGTGCTTCCACAG 

GGATGGAAAGGATCACCATCAATATTCCAGAGTAGCATGACAAAAATCTTAGAGCC 

CTTTAGAGCAAGAAATCCAGAAATAGTCATCTATCAATATATGGATGACTTGTATGT 

AGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAGGAGTTAAGAAAAC 

ATCTGTTAAGGTGGGGATTrACCACACCGGACAAGAAACATCAGAAAGAACCCCCA 

TTTCTITGGATGGGGTATGAACTCCATCCTGACAAATGGACAGTACAGCCTATAGAG 

TTGCCAGAAAAGGAAAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATT 

AAATrGGGCCAGTCAGATTTACCCAGGAATTAAAGTAAGGCAACTTTGTAAACTCCT 

TAGGGGGGCCAAAGCACTAACAGATATAGTACCACTAACTGAAGAAGCAGAATTAG 

AATTGGCAGAGAACAGGGAAATTCTAAGAGAACCAGTACATGGAGTATATTATGAC 

CCATCAAAAGACrTGGTAGCTGAAATACAGAAACAGGGGCATGACCAATGGACATA 

TCAAATTTACCAAGAACCATTCAAAAACCTGAAAACAGGGAAGTATGCAAAAATGA 

GGACTGCCCACACTAATGATGTAAAACAGTrAACAGAGGCAGTGCAAAAAATAGCT 

ATGGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAGACTACCCATCCAAAA 

AGAAACATGGGAGACATGGTGGACAGACTATTGGCAAGCCACCTGGATTCCTGAGT 

GGGAGTTTGTTAATACCCCTCCCTTAGTAAAATrATGGTACCAGCTAGAGAAAGAAC 

CCATAATAGGAGCAGAAACTTrCTATGTAGATGGAGCAGCTAATAGGGAAACTAAA 

ATAGGAAAAGCAGGGTATGTTACTGACAGAGGAAGGCAGAAAATTGTTTCTCTAAC 

AGAAACAACAAATCAGAAGACTGAATTACAAGCAATTCAGCTAGCTTTGCAAGATTC 

AGGATCAGAAGTAAACATAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAG 

CACAACCAGATAAGAGTGAATCAGAGTTAGTCAACCAAATAATAGAACAATTAATA 

AAAAAGGAAAAGGTCTACCTGTCATGGGTACCAGCACATAAAGGAATTGGAGGAAA 

TGAACAAATAGATAAATTAGTAAGTAAGGGAATCAGGAAAGTGCTGTTTCTAGATG 

GAATAGAT 
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SEQ ID NO:36 

GGCGGCATCGTGATCTACCAGTACATGGACGACCTGTACGTGGGCAGCGGCG 
GC 
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SEQ BO NO: 37 

GGIVIYQYMDDLYVGSGG 
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12 jnZA (SEQIDNO:45) 

ACCCACTGCTTAGGCCT^ 

ATCTGTTGTGTGACT^ 

AGTGTGGAAAATCTCTAGCAGTG^G^ 

ACCAGAGAAGATCTGTGGACGCAGGAC^^ 

GCGAGGGGGGCGACTGGTGAGTA^ 

AGGAGAGAGATGGGTGCGAG^GCGTC^ 

GGGAAAAAATTAGGTTACGGCCAGGGGG 

GTATGGGCAAGCAGAGAGCTGGAAAG^ 

ATCAGACC3GATGTAGACAAATAA^ 




CCTGTTGCACCAGGCCAGATGAGAGAA^ 

CAGCCCTQTCAGC^AQAC^^C^^AA^G^ CAAQAGCTAAAA 
ATGTAGACC<3OTCTTCAAAA^ 

AGT(3GGAGGACCTAGCCACAAA^GA^^ 
ACAATACAAGTGTAA^TACAGAA^ 

AAATGTTTCAACTGTGGCAGGGAAGGGCACATA^C^^^^^^^^ 
GAAAAG<X<^^G^^^^^^^ AA cKKWAG<3a:AGG 
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GTAGCGGGCCAAACAAAGGAGGCTCTTTTAGATACAGGAGCAGATGATACAGTACT 

AGAAGAAATAAACTTGCCAGGAAAATGGAAACCAAAAATGATAGGAGGAATTGGA 

GGTTTTATCAAAGTAAGACAGTATGATCAAATACTTATAGAAATTTGTGGAAAAAGG 

GCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAAGATAATTGGAAGAAATCTG 

TTGACTCAGCTTGGATGCACACTAAATTTTCCAATTAGCCCCATTGAAACTGTACCA 

GTAAAATTAAAGCCAGGAATGGATGGCCCAAAGGTTAAACAATGGCCATTGACAGA 

AGAAAAAATAAAAGCATTAACAGAAATTTGTGAGGAAATGGAGAAGGAAGGAAAA 

ATTACAAAAATTGGGCCTGAAAATCCATATAACACTCCAGTATTTGCCATAAAGAAG 

AAGGACAGTACAAAGTGGAGAAAATTAGTAGATTTCAGGGAACTCAATAAAAGAAC 

TCAAGACTTTTGGGAAGTCCAATTAGGAATACCACACCCAGCAGGGTTAAAAAAGA 

AAAAATCAGTGACAGTACTGGATGTGGGAGATGCATATTTTTCAGTCCCTTTAGATG 

AGAGCTTCAGAAAATATACTGCATTCACCATACCTAGTATAAACAATGAAACACCA 

GGGATTAGATATCAATATAATGTTCTTCCACAGGGATGGAAAGGATCACCAGCAA 

TATTCCAGAGTAGCATGACAAGAATCTTAGAGCCCTTTAGAACACAAAACCCAGAA 

GTAGTTATCTATCAATATATGGATGACTTATATGTAGGATCTGACTTAGAAATAGGG 

rAACATAGAGCAAAAATAGAGGAGTTAAGAGGACACCTATTGAAATGGGGATTTAC 

CACACCAGACAAGAAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAAC 

TCCATCCTGACAAATGGACAGTACAGCCTATACAGCTGCCAGAAAAGGAGAGCTGG 

ACTGTCAATGATATACAGAAGTTAGTGGGAAAGTTAAACTGGGCAAGTCAGATTTA 

CCCAGGGATTAAAGTAAGGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAA 

CAGACATAGTGCCACTGACTGAAGAAGCAGAATTAGAATTGGCTGAGAACAGGGA 

AATTCTAAAAGAACCAGTACATGGAGTATATTATGACCCATCAAAAGATTTAATAG 

CTGAAATACAGAAACAGGGGAATGACCAATGGACATATCAAATTTACCAAGAACC 

ATTTAAAAATCTGAGAACAGGAAAGTATGCAAAAATGAGGACTGCCCACACTAATG 

ATGTGAAACAGTTAGCAGAGGCAGTGCAAAAGATAACCCAGGAAAGCATAGTAATA 

TGGGGAAAAACTCCTAAATTTAGACTACCCATCCCAAAAGAAACATGGGAGACATG 

GTGGTCAGACTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTCAATACCCC 

TCCCCTAGTAAAATTGTGGTACCAGCTGGAAAAAGAACCCATAGTAGGGGCAGAAA 

CnTCTATGTAGATGGAGCAGCCAATAGGGAAACTAAAATAGGAAAAGCAGGGTAT 

GTCACTGACAAAGGAAGGCAGAAAGTTGTTTCCTTCACTGAAACAACAAATCAGAA 

GACTGAATTACAAGCAATTCAGCTAGCTTTGCAGGATTCAGGGCCAGAAGTAAACA 

TAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGATAAGAGT 

GAATCAGAATTAGTCAGTCAAATAATAGAACAGTTGATAAAAAAGGAAAAAGTCTA 

CCTATCATGGGTACCAGCACATAAAGGAATTGGAGGAAATGAACAAGTAGACAAAT 

TAGTAAGTAGTGGAATCAGAAAAGTACTGTTTCTAGATGGAATAGATAAAGCTCAA 

GAAGAGCATGAAAAATATCACAGCAATTGGAGAGCAATGGCTAGTGAGTTTAATCT 

GCCACCCATAGTAGCAAAGGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAG 

GGGAAGCCATGCATGGACAAGTCGACTGTAGTCCAGGAATATGGCAATTAGACTGT 

ACACATTTAGAAGGAAAAATCATCCTAGTAGCAGTCCATGTAGCCAGTGGCTACAT 

GGAAGCAGAGGTTATCCCAGCAGAAACAGGACAAGAAACAGCATACTTTATACTAA 

AATTAGCAGGAAGATGGCCAGTCAAAGTAATACATACAGATAATGGCAGTAATTTC 

ACCAGTACCGCAGTTAAGGCAGCCTGTTGGTGGGCAGATATCCAACGGGAATTTGG 

AATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCCATGAATAAAGAATTAA 
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AACAAATTATAAAAAITCAAAAT 



tg^aaaacagat^ 

^ATGGCACAGmAGT^GC^CATA. — ^^g^gaagtacaCAT 



AGCACACAAGTAGACCCTGACCTCACAGApCAACT 

TGTTTTGCAGAATCTGCCATAAGGA^ 
GTGTGACTATCAAGCAGGACATAACAAGGTAGGATCT 

CAGCATTGATAAAACCAAAAAAGATAAAGCC^ 

GTAGAGGATAGATGGAACAA^CCA^^ 

CAATGAATGGACACTAGAGCTO 
TTCCTAGACCATGGCTCCATAACTTAGGA^ 

XTCTAAAATGTAATAATAAGAAATTC * 
-ACAGTACAATGTACACATGGAATTA^ 
rGGTAGCCTAGCAGAAGAAGAGATAA 
TCAAAACAATAATAGTACATCTTAATC 



CA^OrACAA^A^CA^^^^^^^Y^^TG 
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TGGCAATAATACAAGAAAGAGTGTGAGAATAGGACCAGGACAAGCATTCTATGCA 
ACAGGAGACATAATAGGAGATATAAGACAAGCACATTGTAACATTAGTAAAAATGA 
ATGGAATACAACTTTACAAAGGGTAAGTCAAAAATTACAAGAACTCTTCCCTAATA 
GTACAGGGATAAAATTTGCACCACACTCAGGAGGGGACCTAGAAATTACTACACAT 
AGCTTTAATTGTGGAGGAGAATTTTTCTATTGCAATACAACAGACCTGTTTAATAGT 
ACATACAGTAATGGTACATGCACTAATGGTACATGCATGTCTAATAATACAGAGCG 
CATCACACTCCAATGCAGAATAAAACAAATTATAAACATGTGGCAGGAGGTAGGAC 
.GAGCAATGTATGCCCCTCCCATTGCAGGAAACATAACATGTAGATCAAATATTACA 
GGACTACTATTAACACGTGATGGAGGAGATAATAATACTGAAACAGAGACATTCAG 
ACCTGGAGGAGGAGACATGAGGGACAATTGGAGAAGTGAATTATATAAATACA^G 
GTGGTAGAAATTAAACCATTAGGAGTAGCACCCACTGCTGCAAAAAGGAGAGTGGT 
GGAGAGAGAAAAAAGAGCAGTAGGAATAGGAGCTGTGTTCCTTGGGTTCTTGGGAG 
CAGCAGGAAGCACTATGGGCGCAGCATCAATAACGCTGACGGTACAGGCCAGACAA 
TTATTGTCTGGTATAGTGCAACAGCAAAGTAATTTGCTGAGGGCTATAGAGGCGCAA 
CAGCATATGTTGCAACTCACGGTCTGGGGCATTAAGCAGCTCCAGGCAAGAGTCCTG 
GCTATAGAGAGATACCTACAGGATCAACAGCTCCTAGGACTGTGGGGCTGCTCTGG 
AAAACTCATCTGCACCACTAATGTGCTTTGGAACTCTAGTTGGAGTAATAAAACTCA 
AAGTGATATTTGGGATAACATGACCTGGATGCAGTGGGATAGGGAAATTAGTAATT 
" ACACAAACACAATATACAGGTTGGTTGAAGACTCGCAAAGCCAGCAGGAAAGAAA 
TGAAAAAGATTTACTAGCATTGGACAGGTGGAACAATCTGTGGAATTGGTTTAGCAT 
AACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAG 
GTITAAGAATAATTriTGCTGTGCTCTCTCTAGTAAATAGAGTTAGGCAGGGATACT 
CACCCTTGTCATTGCAGACCCTTATCCCAAACCCGAGGGGACCCGACAGGCTCGGA 
GGAATCGAAGAAGAAGGTGGAGAGCAAGACAGCAGCAGATCCATTCGATTAGTGA 
GCGGATTCTTGACACTTGCCTGGGACGACCTACGAAGCCTGTGCCTCTTCTGCTACC 
ACCGATTGAGAGACITCATATTAATTGTAGTGAGAGCAGTGGAACTTCTGGGACAC 
AGTAGTCTCAGGGK3ACTGCAGAGGGGGTGGGGAACCCTTAAGTATTTGGGGAGTCT 
TGTGCAATATTGGGGTCTAGAGTTAAAAAAGAGTGCTATTAATCTGCTTGATACTAT 
AGCAATAGCAGTAGCTGAAGGAACAGATAGGATTCTAGAATTCATACAAAACCTTT 
GTAGAGGTATCCGCAACGTACCTAGAAGAATAAGACAGGGCTTCGAAGCAGCTTTG 
CAATAAAATGGGGGGCAAGTGGTCAAAAAGCAGTATAATTGGATGGCCTGAAGTAA 
GAGAAAGAATCAGACGAACTAGGTCAGCAGCAGAGGGAGTAGGATCAGCGTCTCA 
AGACTTAGAGAAACATGGGGCACTTACAACCAGCAACACAGCCCACAACAATGCTG 
CTTGCGCCTGGCTGGAAGCGCAAGAGGAGGAAGGAGAAGTAGGCTTTCCAGTCAGA 
CCTCAGGTACCTITAAGACCAATGACTTATAAAGCAGCAATAGATCTCAGCTTCTTT 
TTAAAAGAAAAGGGGGGACTGGAAGGGTTAATITACrCCAAGAAAAGGCAAGAGAT 
CCTTGATTTGTGGGTTTATAACACACAAGGCTTCTTCCCTGATTGGCAAAACTACAC 
ACCGGGACCAGGGGTCAGATTTCCACTGACGTTTGGATGGTACTTCAAGCTAGAGCC 
AGTCGATCCAAGGGAAGTAGAAGAGGCCAATGAAGGAGAAAACAACTGTTTACTAC 
ACCCTATGAGCCAGCATGGAATGGAGGATGAAGACAGAGAAGTATTAAGATGGAAG 
TTTGACAGTACGCTAGCACGCAGACACATGGCCCGCGAGCTACATCCGGAGTATTAC 
AAAGACTGCTGACACAGAAGGGACTTTCCGCTGGGACTTTCCACTGGGGCGTTCCAG 
GAGGTGTGGTCTGGGCGGGACAGGGGAGTGGTCAGCCCTGAGATGCTGCATATAAG 
CAGCTGCTTTTCGCCTGTACTGGGTCTCTCTAGGTAGACCAGATCTGAGCCCGGGAG 
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CTCTCTGGCrATCTAGGGAACCCACTGOTAAGCCTCAATAAAGCTTGCCTTGAGTG 
CCTTGAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGA 
CCACTTGTGGTAGTGTGGAAAATCTCTAGCA 
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>C4_Env_TVl_C_ZA_opt_short (SEQ ID NO: 46) 

C^TCACCCTGCAGTGOUVC^TC^G 
CCGGCAACATCACCTGC 
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>C4_Env_TVl_C_ZA__wt (SEQ ID NO: 48) 
TTACCCATCACACTCCAATGCAAAATAAAACA^ 

CATTGCAGGAAACATAACATGTAGATCAAACATCACAGGAATACTATTGACACGTGATGGGGGA 
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>Envgpl 6 0 JTVl_C_ZAop t (SEQ ID NO: 49) 

ATGCGCGTGATGGGCACCCAGAAGAACTGCCAGCAGTGGTGGATC^^ 
CAACACCGAGGACCTGTGGGTGACCGTGTACTACGGC^^ 
GCGACGCCAAGGCCTACGAGACCGAGGTGCACAACGTGTGGGC^ 
GAGATCGTGCTGGGCAACGTGACCGAGAACTTCAACATGTGGAAGAACAACATGGCCGAC 

GAGCCTGTGGGACCAGAGCCTGAAGCCCTGCGTGAAGC^ 
TGACCGGCAACCGCACCGTGACCGGCAACACC^ 
AACTGCAGCTTCAACGCCACCACCGAGCTGCGCGACAAGAAGC^ 
GCCCCTGAACGAGAACAGCAACAACTTCACCTACCGCCTGATCAACTC 

AGGTGAGCTTCGACCCCATCCCCATCCACTACTGCGCCCCCGCCGACTACGCCATCCTGAAGTGC^ 

AACGGCACCGGCCCCTGCTACAACGTGAGCACCGTGCAGTGCACCCACGGCATCAAGCCCGT^ 

GCTGAACGGCAGCCTGGCCGAGGAGGGCATCATCATCCGCAGCGAGAACCTGACCGAGAACACCAA 

ACCTGAACGAGAGTOTGGAGATCAACTGCACCCGCCCCAACAACAACACCCGCMGAG 

GCCTTCTACGCCACCAACGACGTGATCGGCAACATCra 

CCTGCAGCAGGTGATGAAGAAGCTGGGCGAGCACTTC 

TGGAGATCACCATGCACAGCTTCAACTGCCGCGGCGAGTTCTTCTACTGCAACACCAGCAACC^ 
TACCCCAAGAACGGCACCTACAAGTACAACGGCAACAGCAGCCTGCCCATCACCCTG 
GCGCATGTGGCAGGGCGTGGGCCAGGCCATGTACGCCCCCCCCATCGCCGGCAACATCACCTGCCG^^ 
GCATCCTGCTGACCCGCGACGGCGGCTTCAACAACACCAACAACGACACCGAGGAGACCTTCCGCCCCGGCGGC 

ATGCGCGACAACTGGCGCAGCGAGCTGTACAAGTACAAGGTC 
CAAGCGCCGCGTGGTGCAGCGCAAGAAGCGCGCCGTGGG 
GCACC^TGGGCGCCGCCAGCATCACCCTGACCGTGCAGGCCCGC^ 
CTGCTGAAGGCCATCGAGGCCCAGCAGCACATGCTGCAGCTGACCX3 

GGCCATCGAGCGCTACCTGAAGGACCAGCAGCTGCTGGGCATCTGGGGCTGCAGCGGCCGCCTGATCTGCACCACCGCCG 

TGCCCTGGAACAGCAGCTGGAGCAACAAGAGCGAGGCCGACATC 

ATCAACAACTACACCGAGACCATCTTCCGCCTGCTGGAGGACAGCCA 

GGAGCTGGACAAGTGGAACAACCTGTGGAACTGGTTCGACATCAGCAACTGGCTGTGGTACATCA^ 

TCGTGGGCGGCCTGATCGGCCTGCGCATCATCTTCGCCGTGCTGAGCATCGTGAACCGCGTGCGCCAGGGCTACAGCCCC 
CTGAGCTTCCAGACCCTGACCCCCAGCCCCCGCGGCCTGGACCGCCTGGGCGGCATCGAGGAGGAGGGCGGCGAGCAGGA 
CCGCGACCGCAGCATCCGCCTGGTGAGCGGCTTCCTGAGCCTGGCCTGGGACGACCTGCGCAGCCTGTGCCTGTTCAGCT 
ACCACCGCCTGCGCGACTTCATCCTGATCGCCGTGCGCGCCGTGGAGCTGCTC 

CGCGGCTGGGAGATCCTGAAGTACCTGGGCAGCCTGGTGCAGTACTGGGGCCTGGAGCTGAAGAAGAGCGCCATCAGCCC 
CCTGGACACCATCGCCATCGCCGTGGCCGAGGGCACCGACCGCATCATCGAGCTGGTGCAGCGCATCTGCCGCGCCATCC 
TGAACATCCCCCGCCGCATCCGCCAGGGCTTCGAGGCCGCCCTGCTGTAA 
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>Envgpl60_TVX_C_ZAwt (SEQ ID NO: 50) 

^SSaaagI^ 

n a TTf'OTCTTTCAATGC^CCACAGAATTAAGAGATAAGAAACATAAAGAGTATGCA 1 

AATGGGACAGGACCATGTTAlAA^ . T „ TTAGATGTCA ^^ TGAC AGAGAATACCAA^CAATAATAGTAC 
GTTAAATGGTAGTCTAGCAGAAGAAGGGATAM^ 

TTTACAACAGGTAATGAAAAAAiiA^ ACA TCAAACCTGTTTAATAGTACATAC 
TAGAAATTACAATGCATAGCTOTAATTGTAGAGGAGAATTO 



GAATACTATTGACACGTGATGGGGGATTTAACAACACAAACAACGACACAGAGGAGACATTCAGACCTGGA 

~A.TAAA.TAT. 
AGAGCAGT 



ATGAGGGATAACTGGAGAAGTGAATTATATAAATATAAAGTGGTAG^ 

^AAGAAGAGTGGTGCAGAGAAAAAAAAGAGCAGT 
GCACTATGGGGGCAGCGTCAATAACGCTGACGGTACAGGO 



^^^^ 



tcaacatacctaggagaataagacagggctttgaagcagctttgctataa 
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>Gag_TVl_C_ZAopt (SEQ ID NO: 51) 

ATGGG CGCCCGCGCCAGCATCCTGAGCGGCGGCAAGCTGGACAAGT^ 
nrA^TACA^GCTGAAGCACCTOT 

A^ScScOTGTACTGCGTGCACAAGGGCATCGAGGTGCGCGACACCAAGGAGGCCCTCGACAAGATCGAGG 

SSgSSStgccagcagaaggcccagcaggccaaggccgccgacgagaagg 

ACGCCCAG^CCAGATGGTGCACCAGGCCATCAGCCCCCGCACCCTGAACGCC 

SggaSSScgcctggatgaccagcaacccccccatccccgto^^ 

^gISg^ScCCCAGC 

r Sgcag?aaSSaggSgcaaccgcatcatcaagtgct^ 

GAGCCCCTGACCAGCCTGAAGAGCCTGTTCGGCAGCGACCCCCTGAGCCAGTAA 



FIGURE 22 



WO 02/04493 PCTAJS01/21241 

30/114 

>GagJTVl_C_ZAwt (SEQ ID NO: 52) 

ATGGGTGCGAGAGCGTCAATATTAAGCGGCGGAAAATTAGATAAATGGGAAAG^^ 
ACATTATATGTTAAAAC^TCTAGTATGGGCAAGCAGGGAGCTGGAAAGATTTGC^CTTAACCCT 

CAGAAG G CTGTAAACAAATAATAAAACAGCTACAACC AGCT C TTCAGACAGGAACAGAGG AACTTAGAT C ATTATTC AAC 
ACAGTAGCAACTCTCTATTGTGTACATAAAGGGATAGAGGTACGAGA(^CCAA 

ACAAAACAAATGTCAGCAAAAAG CACAACAG GCAAAAG CAG CTGACGAAAAGGTCAGTCAAAATTAT C CTATAGTACAG A 

ATGCCCAAGGGCAAATGGTACACCAAGCTATATCTICCTAGAAC^TTGAATGCATGGATAAAA^ 

TTC^TCCAGAGGAAATACCCATGTTTACAGCAT^^ 

AGTGGGGGGACATCAAGCAGCCATGCAAATGTTAAAAGATACCATCAATGAGGAGGCTGCAGAATGGGATAGGACACATC 
C^GTACATGC^GGGCCTGTTGCACCAGGCCAGATGAGAGA 

CAGGAAC^AATAGCATGGATGACAAGTAATCCACCTATTCCAGTAGAAGACATCT^ 

AAATAAAATAGTAAGAATGTATAGCCCTGTTAGCATTTTGGACATAAAACAAGGGCCAAAAGAACCCTTTAGAGACTATG 
TAGACCGGTTCTTTAAAACCTTAAGAGCTGAACAAGCTACACAAGATGTAAAGA^ 

CAAAATGCGAACCCAGATTGTAAGACCATTTTAAGAGCATTAGGACCAGGGGCCTCATTAGAAGAAATGATGACAGC^ 

TCAGGGAGTGGGAGGACCTAGCCATAAAGCAAGAGTGTTGGCTGAGGCAATGAGCCAAGCAAACAGTAACATACTAGTGC 

AGAGAAGCAATTTTAAAGGCTCTAACAGAATTATTAAATGTTTC 

AGGGCCCCTAGGAAAAAGGGCTGTTGGAAATGTGGACAGGAAGGACACCAAATGAAAGACTGTACTGAGAGGCAGGCTAA 
TTTTTTAGGGAAAATTTGGCCTTCCCACAAGGGGAGGCCAGGGAATTTCCTCCAGA^ 

CAGCAGAACCAACAGCCCCACCAGCAGAGAGCTTCAGGTTCGAGGAGACAACCCCCGTGCCGAGGAAGGAGAAAGAGAGG 
GAACCTTTAACTTCCCTCAAATCACTCTTTGGCAGCGACCCCTTGTCTCAATAA 
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>Gag_TVl_ZA_MHRopt (SEQ ID NO: 53) 

GACATCAAGCAGGGCCCCIAAGGAGCCCTTCCGCGACTACGTGGACCGCTTCTTCAAGACC 
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>GagjTVl_ZA_MHRwt (SEQ ID NO; 54) 

GACAXAAAACAAGGGCCAAAAGAACCCTTTAGAGACTATGTAGACCGGTTCTTTAAAACC 
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>NefJTVl_C_ZAopt (SEQ ID NO: 55) 

ATGGGCGGCAAGTGGAGCAAGCGCAGCATCGTGGGCTGGCCCGCCX3TGCGCGAGOT 

CGAGGGCGTGGGCXJCCGCCAGCCAGGACCTGGACCGCCAC^ 

CCTGTOCCTGGCTCCAGGCCCAGGAGGAGGACGGCGACGTGGGCTTCCCCGTC 

ACCTACAAGAGCGCCGTGGACCTGAGCTTCTTCCTIGAAGGAGAAGG^ 

CCAGGAGATCCTGGACCTGTGGGTGTACAACACCCAGGGCTT^ 

TGCGCTTCCCCCTGACCTTCGGCTGGTGCTTCAAGCTGGTGCCCGTGGACCCCCGCGAGGTG^ 
GAGGACAACTGCCTGCTGCACCCCATGAGCCAGCACGGCGCCG 

(^GCCTGCTGGCCC^CCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACTGCTGA 
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>Nef JTVl_C_ZAwt (SEQ ID NO: 56) 
ATGGGA0K3CAAGTTCTCAA 

AGAGGGAGTAGGAGCAGCGTCTCAAGACTTAGATAGACATGGGGCA 

CTTGTGCCTGGCTGCAAGCACAAGAGGAGGACGGAGATGTAGG^^ 

ACTTATAAGAGTGCAGTAGATCTCAGCTTCTTTTO 

GCAAGAAATCCTTGATTTGTGGGTCTATAACACACAAGGCT^ 

TCCGATTCCCACTGACCTTTGGATGGTGCTTCAAGCTAGTACCAGTTGAC 

GAAGACAACTGTTTGCTACACCCTATGAGCCAA(^TGGAGCAGAGGATGAAGATAGAGAAGTATTAAAGTGGAA 
CAGCCTTCTAGCACACAGACACATGGCCCGCGAGCTACATCCGGAGTATTACAAAGACTGCTGA 
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>NefD125G_TVl_C_ZAopt (SEQ ID NO:57) 
ATGGGCGGCAAGTGGAGCAAGCGCAGCATC^ 

CGAGGGCGTGGGCGCCGCCAGCCAGGACCTGGACCGCCACGGCGCCCTGACCAGCAGCAACA 

CCTGCGCCTGGCTGCAGGCCCAGGAGGAGGACGGCGACGTGGGCTTCCCCGTC^ 

ACCTACAAGAGCGCCXjTGGACCTGAGCTTCTTCCTGAAGGAGAAGGGCGGCCTGGAGGGCCTGATCTA 

CCAGGAGATCGTGGACCTGTGGGTGTACAACACCCAGGGCTTCTTCCCCGGCTGGCAGAAOT^ 

TGCGCTTCCCCCTGACCTTCGGCTGGTGCTTCAAGCTOGTGCCCGTGGACCCCCGCGAGGTC 

GAGGACAACTGCCTGCTGCACCCCATGAGCCAGCACGGCGCCGAGGACGAGGACCGCGAGGTGCTGAA^ 

CAGCCTGCTGGCCCACCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACTGCTGA 
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>pl5RNaseHJTVl_C_ZAopt (SEQ ID NO: 58) 

ACCTTCTACGTGGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCTACGTGACCGACCGCGGCCGCC^G^ 

GATCGTGACCCTGACCAACACGACCAAC^ 

AGGTGAACATCGTGACCGACAGCCAGTACGCCCTGGGCATCATC 

AACCAGATCATCGAG(^GCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGCCCGCCCAC^GGGCATCGGCGG^ 
CGAGCAGGTGGACAAGCTGGTGAGCAAGGGCATC 
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RACCAAATAATAGAACAGTTAATAAACAAGGAAAIj 
TGAACAAGTAGATAAATTAGTAAGTAAGGGAATT 



FIGURE 30 



WO 02/04493 4 PO7US01/21241 

38/114 • - 

>p31IntJTVl_C_Zaopt (SEQ ID NO: 60) 

CGCAAGGTGCTGTTCCTGGACGX3 

CAACGAGTTCAACCTGCCCCCCATCGT 

TCCACGGCCAGGTMACTGCM^ 

GTGCACGTGGCCAGCGGCTACATGGAGGCCGAGGTGATCCCCGCCGAGACCGGCCAGGAGACCGCCTACTTCATCCTGAA 
GCTGGCCGGCCGCTGGCCCGTGAAGGTGATCCAC^^ 

GCTGGTGGGCCGGCATCCAGCAGGAGTTCGGCATCCCCTACAACCCCCAGAGCCAGGGCGTGGTGGAGA 
GAGCTGAAGAAGATCATCGGCCAGGTGCGCGACCAGGCCGAGCACCTGAAGACCGCCGTGCAGATGGCCGTGT^ 
CAACTTCAAGCGCAAGGGCGGCATCGGCGGCTACAGCGCCGGCGAGCGCATCATCGACAT 
CCAAGGAGCTGCAGAAGCAGATCATCCGC^TCCAGAACT^ 

GGCCCCGCCGAGCTGCTGTGGAAGGGCGAGGGCGTGGTGGTGATCGAGGACAAGGGCGACATCAAGGTGGT 
CAAGGCCAAGATCATCCGCGACTACGGCAAGCAGATGGCCGGCGCCGACTGCGTGGCCGGCGGCCAGGACGAGGAC 
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>p31Int_TVl_C_ZAwt (SEQ ID NO: 61) 
AGGAAAGTGTTGTTTCTAGATGGAATAGATAAAGCTCM 

TAATGAGTTTAATCTGCCAC CCATAGTAGCAAAAGAAAT A CTAAAAGGGGAAG CCA 

TACATGGACAAGTCGACTGTAGTCCAGGGATATGGCAATTAGATTGTACCCAT^ 

GTCC^TGTAGCTAGTGGCTACATGGAAGCAGAGGTTATCCCAGCAGAAACAGGACAAGAAA 

ATTAGCAGGAAGATGGCCAGTCAAAGTAATACATACAGACAATGGCAGTAATTT^ 

GTTGGTGGGCAGGTATCCAACAGGAATTTGGAATTCCOTACAATCCCCA^ 

GAATTAAAGAAAATAATAGGACAAGTAAGAGATCAAGCTC 

CAATTTTAAAAGAAAAGGGGGAATTGGGGGGTACAGTGCAGGGGAAAGAATA 

CTAAAGAATTACAAAAACAAATTATAAGAATTCAAAATTTTC 

GGACCAGCCGAACTACTCTGGAAAGGTGAAGGGGTAGTAGTAATAGAAGATAAAGGTGACATAAAGGTAGTACC^ 
GAAAGCAAAAATCATTAGAGATTATGGAAAACAGATGGC^^ 
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>Pol_TVl_C_ZAopt r (SEQ ID NO: 62) 

TTCTTCCGCGAGAACCTGGCCTTCCCCCAGGGCGAGGCCCGCGAGTTCCCCCCCGAGCAGACCCGCGCC^ 

CAGCCGCACCAACAGCCCC^CCAGCCGCGAGCTGCAGGTGCGCGGCGACAACCCCCGCG 

GCACCTTGAACTTCCCCCAGATCACCCTGTGGCAGCGCCCCCTGGTC 

CTGCTGGACACCGGCGCCGACGACACCGTGCTGGAGGAGATCGACCTGCCCGGCAAGTGGAAGCCCAAGATGATCGGCGG 
CATCGGCGGCTTCATCAAGGTGCGCCAGTACGACCAGATCCTGATCGAGATC 
TGGTGGGCCCCACCCCCGTGAAGATC^TCGGCCGCAACCTGCTGACCCAGCTGGGCTGCA 
CCCATCGAGACCGTGCCCGTGAAGCTGAAGCCCGGCAT^ 

GATCAAGGCCCTGACCGCCATCTGCGAGGAGATGGAGAAGGAGGGCAAGATGACCAAGATCGGCCCCGACAACCCCTACA 
ACACCCCCGTGTTCGCCATCAAGAAGAAGGACAGGACCAAGTGGCGCAAGCTGGTGGA^ 

ACCCAGGACTTCTGGGAGGTGCAGCTGGGCATCCCCCACCCCGCCGGCCTGAAGAAGAAGAAGAGCGTGACCGTGGTGGA 
CGTGGGCGACGCCTACTTCAGCGTGCCCCTGGACGAGAGCTTCCGC^ 

ACGAGACCCCCGGCATCCGCTACCAGTACAACGTGCTGCCCCAGGGCTGGAAGGGCAGCCCCGCGATCTTCCAGAGCAGC 
ATGACCAAGATCCTGGAGCCCTTCCGCGCCAAGAACCCCGACATCGTGATCTACCAGTACATGGACGACCTGTACGTGGG 
CAGCGACCTGGAGATCGGCCAGCACCGCGCCAAGATCGAGGAGCTGCGCGAGCACCTGCTGAAGTGGGGCTTCACCACCC 
CCGACAAGAAGCACCAGAAGGAGCCCCCCTTCCTGTGGATGGGCTACGAGCTGCACCCCGACAAGTGGACCGTGCAGCCC 
ATCCTGGTGCCCGAGAAGGACAGCTGGACCGTGAACGACATCC^GAAGCTGGTGGGCAAGCTGAACTGGGCCAGCCAGAT 
CTACCCCGGCATCAAGGTGCGCCAGCTGTGCAAGCTGCTGCGCGGCGCCAAGGCCCTGACCGACATCGTGCCCCTGACCG 
AGGAGGCCGAGCTGGAGCTGGCCGAGAACCGCGAGATCCTGCGCGAGCCCGTGCACGGCGTGTACTACGACCCCAGCAAG 
GACCTGATCGCCGAGATC(^GAAGCAGGGCC^CGAGCAGTGGACCTACCAGATCTAC<^GGAGCCCTTCAAGAACCTGAA 
GACCGGCAAGTACGCCAAGATGCGCACCACCCACACCAACGACGTGAAGCAGCTGACCG^ 

TGGAGAGCATCGTGATCTGGGGCAAGACCCCCAAGTTGCGCCTGCCCATCCAGAAGGAGACCTGGGAGACCTGGTGGACC 

GACTACTGGCAGGCCACCTGGATCCCCGAGTGGGAGTTCGTGAACACCCCCCCCCTGGTGAAGCTGTGGTACCAGCTGGA 

GAAGGACCCCATCGCCGGCGTGGAGACCTTOTACGTGGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCT 

ACGTGACCGACCGCGGCCGCCAGAAGATCGTGACCCTGACCAACACCACCAACCAGAAGACCGAGCTGCAGGCCATCCAG 

CTGGCCCTGCAGGACAGCGGCAGCGAGGTGAACATCGTGACCGAC^GCC^GTACGCCCTGGGCATCATCCAGGCCCAGCC 

GGAC^GAGCGACAGCGAGATCTTCAACCAGATC^TCGAGCAGCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGC 

CCGCCCACAAGGGCATCGGCGGCAACGAGCAGGTGGACAAGCTGGTGAGCAAGGGCATCCGCAAGGTGCTGTTCCTGGAC 

GGCATCGACAAGGCCCAGGAGGAGCACGAGCGCTACCAC^GCAACTGGCGCGCCATGGCCAACGAGTTC^VACCTGCCCCC 

CATCGTGGCCAAGGAGATCGTGGCCAGCTGCGACAAGTGCCAGCTGAAGGGCGAGGCCATCCACGGCCAGGT 

GCCCCGGCATCTGGCAGCTGGACTGCACCCACCTGGAGGGCAAGATCATCCTGGTGGCCGTGCACGTC 

ATGGAGGCCGAGGTGATCCCCGCCGAGACCGGCCAGGAGACCGCCTACTTCATCCTGAAGCTGGCCGGCCGCTGGCCCGT 

GAAGGTGATCCACACCGACAACGGCAGCAACTTCACCAGCACCGCCGTGAAGGCC 

AGGAGTTCGGCATCCCCTAC^CCCCCAGAGCCAGGGCGTGGTGGAGAGCATGAACAAGGAGCTGAAGAAGATCATCGGC 

CAGGTGCGCGACCAGGCCGAGCACCTGAAGACCGCCGTGCAGATGGCCGTGTTCATCCACAACTTCAAGCGCAAGGGCGG 

CATCGGCGGCTACAGCGCCGGCGAGCGCATCATCGACATCATCGCCACCGACATCCAGACCAAGGAGCTGCAGAAGCAG 

TCATCCGCATCCAGAACTTCCGCGTGTACTACCGCGACAGCCGCGACCCC^.TCTGGAAGGGCGCCGCCGAGCTGCTGTGG 

AAGGGCGAGGGCGTGGTGGTGATCGAGGACAAGGGCGACATCAAGGTGGTGCCCCGCCGCAAGGCCAAGATCATCCGCGA 

CTACGGCAAGCAGATGGCCGGCGCCGACTGCGTGGCCGGCGGCCAGGACGAGGAC 
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>Pol_TVl_C_ZAwt (SEQ ID NO: 63) 

^SgaaccaI^gccccaccagca^ 

PAACCTTTAACTTCCCTCWUVrCACTC 
rTCTTAGACACAGGAGCyvGA 

^TTGGAGGTTTTATCAAAGTAAGACAGTATGATCAAATACTTATAGAAATTTGTGGAAAAAAGGCTATAGGTACAGTAT 

™^^cctacaccagtcaacataattggaa<3aaa^ 
StattgIaac^ 

a^taaaagcattaacagcaatttgtgaggaaatggagaaggaaggaaaaattac^ 
a^actccagtatttoccataaaaaagaaggacagtactaagtggagaaaattagtagat 

SSSgac^gggaagttcaa^ . 

^TGGGGGATCCATATTTTTCAGTTCCTTTAGATGAAAGCTTCAGGAAATATACTG 

SgaaIcac'cS 

RTGACAAAAATCT^GAGCCCTTCAGAGCAAAAAATCCAGACATAGTTATCT 

mSga^gaaatagggc^catagagcaaaaatagaagagttaag^^ 

atac^^gSaaaaggatagttggactgtcaatgatatacagaagttagtgggaaaat^ 
ScSagggSaaagtaaggcaactctgtaaactcct 

^cSSJgotgaaatacagaaacaggggcatgaacaatg^ 

SSgG^G^ 
^TOGAAAG^StAGTAATATGGGGAA 

^C^TGGCAAG^ 

™Sccca^ag^^ 

ATOTTACTGACA^AG^AA^GGCAGAAAATTGTTACTCTAACTAAC^ 

agaSSgSactcagagatatttaaccaaataatagaa^ 

SSSJaaag^ 
SSaga^gctcaagaaga^ 

SSaggga^gcaattagattgtacccatttagagggw 
SggScaSSttmcccagcagaaacagga^ 
SaactStaStacagacaatggcagtaattttaccagtactc 
S^ggSttccctacaatccccaaagtcagggagt^tagaatccatgaata^ 

caag^aag^gatcaagctgagcaccttaagacagc^^ 

AA^G^GGGTACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACAm 

^aaIaaScaaaatttc^^ 

AAAGGTGAAGGGGTAGTAGTAATAGAAGATAAAGGTGACATAA 
TTATGGAAAACAGATGGCAGGTGCTGATTGTGTGGCAGGTGGACAGGATGAAGAT 
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>Prot_TVl__C_ZAopt (SEQ ID NO: 64) 

CCCCAGATCACCCTGTGGCAGCGCCCCCTGGTGAG<^T 
CGCCGACGACACCGTGCTGGAGGAGATCGACCTGCCCGGC2^ 

TCAAGGTGCGCCAGTACGACCAGATCCTGATCGAGATCTGCGGCAAGAAGGCCATCGGCACCGTGCTGGTGGGCCCCACC 
CCCGTGAACATCATCGGCCGCAACCTGCTGACCCAGCTGGGCTGCACCCTGAACTTC 
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>Prot_TVl__C_ZAwt (SEQ ID NO: 65) 
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>ProtinaJIVl_C_ZAopt (SEQ ID NO: 66) 
CCCCAGATCACCCTGTGGCAGCGCCCCCTGGTGAGCATC^ 

CGCCGACGACACCGTGCTGGAGGAGATCGACCTGCCCGGCAAGTGGAAGCCCAAGATGATCGGCGGCATCGGCGGCTT^ 
TCAAGGTGCGCCAGTACGACCAGATCCTGATCGAG 

CCCGTGAACATCATCGGCCGCAACCTGCTGACCCAGCTGGGCTGCACCCTGAACTTC 
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>ProtinajTVl_C_ZAwt (SEQ ID NO: 67) 
CCTCAAATCACTCTCTGGGAX3C 

AGCAGATGATACAGTATTAGAAGAAATAGATTTGC(^GGGAAATGGAAACCAAAAATG 
TCAAAGTAAGA^GTATGATCAAATACTTATAGAAATTTC^ 

CC^GTCAACATAATTGGAAGAAATCTGTTAACTCAGCTTGGATGCACACTAAATTTT 
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>ProtinaRTmutJTVl_C_ZAopt (SEQ ID NO: 68) 
CCCC^GATCACCCTGTGGCAGCGCCCCCT 

CGCCGACGACACCGTGCTGGAGGAGATCGACCTGCCCGGCAAGTGGAAGCCCAAGATGATCGGCGGCATCGGCGGCTTCA 

TCAAGGTGCGCCAGTACGACGAGATCCTGATCGAGATCTGCGGCAAGAAGGCCATC 

CCCGTGAACATCATCGGCCGCAACCTGCTGACCCAGCTGGGCTGCACCCTGAACT^ 

GCCCGTGAAGCTGAAGCCCGGCATGGACGGCCCCAAGGTGAAGCAGTGGCCCCTGACCGAGGAGAAGATCAAGGCCCTGA 

CCGCCATCTGCGAGGAGATGGAGAAGGAGGGCAAGATCACCAAGATCGGCCCCGACAA 

GCCATCAAGAAGAAGGACAGCACCAAGTGGCGCAAGCTGGTGGACTTCCGCGAGCTG^ 

GGAGGTGCAGCTGGGCATCCCCCACCCCGCCGGCCTGAAGAAGAAGAAGAGCGTGACCGTGCTGGACGTGGGCGACGCCT 
ACTTCAGCGTGCCCCTGGACGAGAGCTTCCGCAAGTACACCGCCTTC^^ 

ATCCGCTACCAGTACAACGTGCTGCCCCAGGGCTGGAAGGGGAGCCCCGCCATCTTCCAGAGCAGCATGA 

GGAGCCCTTCCGCGCCAAGAACCCCGACATCGTGATCTACCAGGCCCCCCTGTACGTGGGCAGCGACCTGGAGATCGGCC 

AGCACCGCGCC^GATCGAGGAGCTGCGCGAGC^CCTGCTGAAGTGGGGCTTCACCACCGCCGACAAGAAGCACCAGAAG 

GAGCCCCCCTTCCTGCCCATCGAGCTGCACCCCGACAAGTGGACCGTGCAGCCCATCCTGCTGCCCGAGAAGGACAGCTG 

GACCGTGAACGAGATCCAGAAGCTGGTGGGCAAGCTGAACTGGGCCAGCCAGATCTACCCCGGCATCAAGGTGCGCCAGC 

TGTGCAAGCTGCTGCGCGGCGCCAAGGCCCTGACCGACATCGTGCCCCTGACCGAGGAGGCCGAGCTGGAGCTGGCCGAG 

AACCGCGAGATCCTGCGCGAGCCCGTGCACGGCGTGTACTACGACCGCAGCAAGGACCTGATCGCCGAGATCCAGAAGCA 

GGGCCACGAGCAGTGGACCTACCAGATCTACCAGGAGCCCTTCAAGAACCTGAAGACCGGCAAGTACGCCAAGATGCGCA 

CCACCCACACCAACGACGTGAAGCAGCTGACCGAGGCCGTGCAGAAGATCGCCATGGAGAGCATCGTGATCTGGGGCAAG 

ACCCCCAAGTTCCGCCTGCCCATCCAGAAGGAGACCTGGGAGACCTGGTGGACCGACTACTGGCAGGCCACCTGGATCCC 

CGAGTGGGAGTTCGTGAACACCCCCCCCCTGGTGAAGCTGTGGTACCAGCTGGAGAAGGACCCCATCGCCGGCGTGGAGA 

CCTTCTACGTGGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCTACGTGACCGACCGCGGCCGCCAGAAG 

ATCGTGACCCTGACC^CACC^CC^CCAGAAGACCGAGCTGCAGGCCATCC^GCTGGCCCTGCAGGACAGCGGCAGCGA 

GGTGAACATCGTGACCGACAGCCAGTACGCCCTGGGCATCATCCAGGCCCAGCCCGACAAGAGCGACAGCGAGATCTTCA 

ACC^GATCATCGAGCAGCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGCCCGCCCACAAGGGCATCGGCGGCAAC 

GAGCAGGTGGACAAGCTGGTGAGCAAGGGCATCCGCAAGGTGCTG 
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> Pro t i naRTmu t_TVl_C_ZAwt (SEQ ID NO: 69) 

CCT CAAATCACTCTTTGGCAGCGACCCCTTGTCTCAATAAAAGT^ CTCTCTTAGCCACAGG 

AGCAGATGATACAGTATTAGAAGAAATAGATTTGCCAGGGAAATGGAAACCAAAAATG 

TCAAAGTAAGACAGTATGATCAAATACTTATAGAAATTTC 

CCAGTC^UVCATAATTGGAAGAAATCTGTTAACTCAGCTTGGATGCACACT 

ACCAGTAAAATTAAAACCAGGAATGGATGGCCCAAAGGTCAAACAATGGCC^ 

CAGCAATTTGTGAGGAAATGGAGAAGGAAGGAAAAATTAGAAAAATTGGGC 

GCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGATTTCAGGG 

GGAAGTTC^TTAGGAATACCACACCCAGCAGGATTAAAAAAGAAA 

ATTTTTCAGTTCCTTTAGATGAAAGCTTCAGGAAATATACTGCATTCACCATAC 

ATTAGATATCAATATAATGTGCTGCCAC^GGGATGGAAAGGATCA^ 

AGAGCCCTTCAG AGCAAAAAATC CAGACATAGTTATCTATCAAGCC CCGTTGTATGTAGGATCTGACTTAGAAATAGGGC 
AACATAGAGCAAAAATAGAAGAGTTAAGGGAACATTTATTGAAATGGC^ 
GAACCCCCATTTCTTCCCATCGAACTCCATCCTGACAAATGGACAGTACAACCTATACTGCT 
GACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTC 

TCTGT AAAC TC CTC AGGGGGG C CAAAGCACTAACAG ACATAGT AC CACTAACTG AAG AAGCAG AATTAGAATTGGCAG AG 

AACAGGGAAATTTTAAGAGAACCAGTACATGGAGTATATTATGATCGATCAAAAG 

GGGGCATGAACAATGGACATATCAAATTTATCAAGA^^ 

CTACCCACACTAATGATGTAAAACAGTTAAC^^ 

ACTCCTAAATTTAGACTACCCATCCAAAAAGAAACATGGGAGAC^ 

TGAGTGGGAGTTTGTTAATACCCCTCCCCTAGTAAAATTATGGTACCAACTAGAAAAAGATCCC^ 
CTTTCTATGTAGATGGAGCAACTAATAGGGAAGCTAAAATAGGAA 

ATTGTTACTCTAACTAAGACAACAAATCAGAAGACTGAGTTACAAGCAATTCAGCTAGCTCTGCAGG 
AGTAAACATAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGATAAGAGTGACTCAGAGATAT^ 
ACCAAATAATAGAACAGTTAATAAACAAGGAAAGAATCTACCTG^ 
GAACAAGTAGATAAATTAGTAAGTAAGGGAATTAGGAAAGTGTTG 
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>ProtwtRTwt_TVl_C_ZAopt (SEQ ID NO: 70) 
CCCCAGATCACCCTGTGGCAGC 

CGCCGACGACACCGTGCTGGAGGAGATCGACCTGCCCGGCAAGTGGAAGCCCAAGATGAT 
TCAAGGTGCGCCAGTACGACCAGATCCTGATCGAGATCTGCGGCAAGAA 

CCCGTGAACATCATCGGCCGCAACCTGCTGACCCAGCTGGGCTGCACCCTGAACTTCCCCATCAGCCCCATCGAGACCGT 
GCCCGTGAAGCTGAA.GCCCGGCATGGACGGCCCCAAGGTGAAGCAGTGGCCCCTGACCGAGGAGAAGATCAAGGCCCTGA 
CCGCCATCTGCGAGGAGATGGAGAAGGAGGGCAAGATCACCAAGATCGGCCCCGACAACCCCTACAA 
GCCATCAAGAAGAAGGAC^GCACCAAGTGGCGCAAGCTGGTGGACTTCCGCGAGCTGAACAAGCG 

GGAGGTGCAGCTGGGCATCCCCCACCCCGCCGGCCTGAAGAAGAAGAAGAGCGTGACCGTGCTGGACGTGGGCGACGCCT 
ACTTCAGCGTGCCCCTGGACGAGAGCTTCCGCAAGTACA^ 

ATCCGCTACCAGTACAACGTGCTGCCCCAGGGCTGGAAGGGCAGCCCCGCCATCTTCCAGAGCAGC^ 

GGAGCCCTTCCGGGCCAAGAACCCCGACATCGTGATCTACCAGTACATGGACGACCTGTACGTGGGCAGCGACCTGGAGA 

TCGGCCAGCACCGCGCCAAGATCGAGGAGCTGCGCGAGCACCTGCTGAAGTGGGGCTTCACC^CCCCCGACAAGAAGCAC 

CAGAAGGAGCCCCCCTTCCTGTGGATGGGCTACGAGCTGCACCCCGACAAGTGGACCGTGCAGCCCATCCTGCTGCCCGA 

GAAGGACAGCTGGACCGTGAACGACATCCAGAAGCTGGTGGGCAAGCTGAAC^ 

AGGTGCGCCAGCTGTGCAAGCTGCTGCGCGGCGCCAAGGCCCTGACCGACATCGTGCCCCTGACCGAGGAGGCCGAGCTG 
GAGCTGGCCGAGAACCGCGAGATCCTGCGCGAGCCCGTGCACGGCGTGTACTACGACCCCAGCAAGGACCTGATCGCCGA 
GATCCAGAAGCAGGGCCACGAGCAGTGGACCTACCAGATCTACCAGGAGCCCTTCAA.GAACCTGAAGACCGGCAAGTACG 

CCAAGATGCGCACCACCCACACCAACGACGT^^ 

ATCTGGGGCAAGACCCCCAAGTTCCGCCTGCCCATCCAGAAGGAGACCTGGGAGACCTGGTGGACCGACTACTGGCAGGC 
CACCTGGATCCCCGAGTGGGAGTTCGTGAACACCCCCCCCCTGGTGAAGCTGTGGTACCAGCTGGAGAAGGACCCCATCG 
CCGGCGTGGAGACCTTCTACGTGGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCTACGTGACCGACCGC 

GGCCGCCAGAAGATCGTGACCCTGACCM^CACCACCAACC 

CAGCGGCAGCGAGGTGAACATCGTGACCGACAGCCAGTACGCCCTGGGCATCATCCAGGCCCAGCCCGACAA^ 

GCGAGATCTTCAACCAGATCATCGAGCAGCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGCCCGCCCACAAGGGC 

ATCGGCGGCAACGAGCAGGTGGACAAGCTGGTGAGCAAGGGCATCCGCAAGGTGCTG 
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>ProtwtRTwt_TVl_C_ZAwt (SEQ ID NO: 71) 

CCTCAAATCACTCTTTGGCAGCGACCCCTTGTCT^ 

AGCAGATGATACAGTATTIAGAAGAAATAGATTTGCCAGGG^ 

TCAAAGTAAGACAGTATGATCAAATACTTATAGAAATC^ 

CCAGTCAACATAATTGGAAGAAATCTGTTAACTCAG CTTGGATGCACACTAAATTTTCCAATTAGTC CTATTGAAACTGT 
ACCAGTAAAATTAAAACCAGGAATGGATGGCCCAAAGGTC^AACAATC 
CAGCAATTTGTGAGGAAATGGAGAAGGAAGGAAAAATTACAAAAATTGGGCCTGATAATCC^ 
GCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGATT 

GGAAGTTCAATTAGGAATACCACACCCAGCAGGATTAAAAAAGAAAAAATCAGTGACAGTGCTAGATG 

ATTTTTCAGTTCCTTTAGATGAAAGCTTCAGGAAATATACTGCATTCACCATACCTA 

ATTAGATATCAATATAATGTGCTGCCACAGGGATC 

AGAGCCCTTCAGAGCAAAAAATCCAGACATAGTTATCTATCAATATATGGATG 

TAGGGCAACATAGAGCAAAAATAGAA.GAGTTAAGGGAACATTTATTGAAATGGGGATTTA 

CAAAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTC 

AAAGGATAGTTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATT^ 
AAGTAAGGCAACTCTGTAAACTCCTCAGGGGGGCCAAAGCACTAACAGACATAGTACCACTAACTGAAGAAGCAGAA 
GAATTGGCAGAGAACAGGGAAATTTTAAGAGAACCAGTACATC 
AATAC^GAAACAGGGGCATGAACAATGGACATATCAAATTTA^ 

CAAAAATG AGG ACTAC CCACACTAATGATGTAAAACAGTTAACAGAGGC AGTG CAAAAAATAGCC ATGG AAAGCATAGTA 
ATATGGGGAAAGACTCCTAAATTTAGACTACCCATCCAAAAAGAAAC 

CACCTGGATCCCTGAGTGGGAGTTTGTTAATACCCCTCCCCTAGTAAAATTATGGTACCAACTAGAAAAAGATCCCATAG 

CAGGAGTAGAAACTTTCTATGTAGATGGAGCAACTAATAGGGAAGCTA 

GGAAGGCAGAAAATTGTTACTCTAACTAACACAACAAATCAGAAGACTGAGTT^ 

TTCAGGATCAGAAGTAAACATAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGATAAGAGTGA 

CAGAGATATTTAACCAAATAATAGAACAGTTAATAAACAAGGAAAGAATCTACCTGTCATGGGT 

ATTGGGGGAAATGAACAAGTAGATAAATTAGTAAGTAAGGGAATTAGGAAAGTGTTG 
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>RevExonl_TVl_C_ZAopt (SEQ ID NO: 72) 

ATGGCCGGCCGCAGCGGCGACAGCGACGAGGCCCTGCTGCAGGTGGTGAAGATCATCAAGATCCTGTACCAGAGC 
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>RevExonl_TVl_C_ZAwt(SEQ ID NO: 73) 
ATGGCAGGAAGAAGCGGAGACA^ 
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>RevExon2__TVl_C_ZAopt-2 (SEQ ID NO: 74) 

CCCTACCCO^GCCCGAGGGCACCCGCCAGGCCCGCCGCAACCGCCGCCGCCGCTGGCGCGCCCGCCAGCGCCAGATCCA 
GACCATCGGCGAGCGCATCCTGGTGGCCTGCCTGGGC^ 

GCCTGCACATCAACTGCAGCGAGGGCAGCGGCACC^GCGGCACCCAGCAGAGCC^ 
CCCTAA 
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>RevExon2_TVl_C_ZAwt (SEQ ID NO: 75) 

ACCCTTACCCCAAGCCCGAGGGGACT 
CATACGATTGGTGAGCGGATTCHTTGTCGCTTGCCTG 

GAGACTT (^TATTAATTGCAGTGAGGG CAGTGGAACTTCTGGGACAC AGC AGT CTCAGGGGACTACAGAGGGG GTGGG AG 
ATCCTTAA 
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RT_TVl_C_ZAopt (SEQIDNO:76) 

CCCATCAGCCCCATCGAGACCGTGCCCGTGAAGCTGAAGCCCGGCATGGACGGCCCCA 

AGGTGAAGCAGTGGCCCCTGACCGAGGAGAAGATCAAGGCCCTGACCGCCATCTGCG 

AGGAGATGGAGAAGGAGGGCAAGATCACCAAGATCGGCCCCGACAACCCCTACAACA 

CCCCCGTGTTCGCCATCAAGAAGAAGGACAGCACCAAGTGGCGCAAGCTGGTGGACTT 

CCGCGAGCTGAACAAGCGCACCCAGGACTTCTGGGAGGTGCAGCTGGGCATCCCCCAC 

CCCGCCGGCCTGAAGAAGAAGAAGAGCGTGACCGTGCTGGACGTGGGCGACGCCTAC 

TTCAGCGTGCCGeTGGACGAGAGCTTCCGCAAGTACACCGCCTTCACCATCCCCAGCA 

TCAACAACGAGACCCCCGGCATCCGCTACCAGTACAACGTGCTGCCGCAGGGCTGGAA 

GGGCAGCCCCGCCATCTTCCAGAGCAGCATGACCAAGATCCTGGAGCCCTTCCGCGCC 

AAGAACCCCGACATCGTGATCTACCAGTACATGGACGACCTGTACGTGGGCAGCGACC 

TGGAGATCGGCCAGCACCGCGCCAAGATCGAGGAGCTGCGCGAGCACCTGCTGAAGT 

GGGGCTTCACCACCCCCGACAAGAAGCACCAGAAGGAGCCCCCCTTCCTGTGGATGGG 

CTACGAGCTGCACCCCGACAAGTGGACCGTGCAGCCCATCCTGCTGCCCGAGAAGGAC 

AGCTGGACCGTGAACGACATCCAGAAGCTGGTGGGCAAGCTGAACTGGGCCAGCCAG 

ATCTACCCCGGCATCAAGGTGCGCCAGCTGTGGAAGCTGCTGCGCGGCGCCAAGGCCC 

TGACCGACATCGTGCCCCTGACCGAGGAGGCCGAGCTGGAGCTGGCCGAGAACCGCG 

AGATCCTGCGCGAGCCCGTGCACGGCGTGTACTACGACCCCAGCAAGGACCTGATCGC 

CGAGATCCAGAAGCAGGGCCACGAGCAGTGGACCTACCAGATCTACCAGGAGCCCTT 

CAAGAACCTGAAGACCGGCAAGTACGCCAAGATGCGCACCACCCACACCAACGACGT 

GAAGCAGCTGACCGAGGCCGTGCAGAAGATCGCCATGGAGAGCATCGTGATCTGGGG 

CAAGACCCCCAAGTTCCGCCTGCCCATCCAGAAGGAGACCTGGGAGACCTGGTGGACC 

GACTACTGGCAGGCCACCTGGATCCCCGAGTGGGAGTTCGTGAACACCCCCCGCCTGG 

TGAAGCTGTGGTACCAGCTGGAGAAGGACCCCATCGCCGGCGTGGAGACCTTCTACGT 

GGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCTACGTGACCGACCG 

CGGCCGCCAGAAGATCGTGACCCTGACCAACACCACCAACCAGAAGACCGAGCTGCA 

GGCCATCCAGCTGGCCCTGCAGGACAGCGGCAGCGAGGTGAACATCGTGACCGACAG 

CCAGTACGCCCTGGGCATCATCCAGGCCCAGCCCGACAAGAGCGACAGCGAGATCTTC 

AACCAGATCATCGAGCAGCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGCCCG 

CCCACAAGGGCATCGGCGGCAACGAGCAGGTGGACAAGCTGGTGAGCAAGGGCATCC 
GCAAGGTGCTG 
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>RT_TVl_C__ZAwt{SEQ ID NO: 77) 
ATCC^TATAACMCCAGTATTTCCCATAAA^ 

GTAC^CCTATACTGC^CC^^ 

AAGTCAGATTTACCCAGTCA^A^GTAAGGCAA 

CACTAACTGAAGAAGC^GAATTAGAATTGGCAGAGAA^^ 

C^TCAAAAGACTTGATAGCTGAAATACAGAAACA^ 

AAATCTGAAAACAGGGAAGTATGCAAAAATTCAC^ 

AAATAGCCATGGAAAGCATAGTAATATGGGGAAAGACTCCTAAATU ia 

TGGTGGACAGACTATTGGC^GCCACCTGGATCCCTGAG^GGAGT 

CCAACTAGAAAAAGATCCCA.TAGCAGGAGTAGAA&CT 



CATGGGT, 
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>RTmut_TVl_C_ZAopt (SEQ ID NO: 78) 

CCCATCAGCCCCATCGAGACCGTGCCCGTGAAGCTGAAGCCCGGCATGGACGGCCCCAAGGTGAAGCAGTGGCCCCTGAC 

CGAGGAGAAGATCAAGGCCCTGACCGCCATCrGCGAGGAGATGGAGAAG 

ACCCCTACAACACCCCCGTGTTCGCCAT^ 

AACAAGCGCACCCAGGACTTCTGGGAGGTGCAGCTGGGCATCCCCCACCCCGCCGGCCTGAAGAAGAAGAAGAGCGTGAC 
CGTGCTGGACGTGGGCGACGCCTACTTCAGCGTGCCCCTGGACGAGAGCTTCCGCAAGTAC^CCGCCTT^CCATCCCC^ 
GCATCAACAACGAGACCCCCGGCATCCGCTACCAGTACAACGTGCTGCCCCAGGGCTGGAAGGGCAGC^ 
(^GAGCAGGATGACCAAGATCCTGGAGCCCTTC^ 

GGGCAGCGACCTGGAGATCGGCCAGCACCGCGCCAAGATCGAGGAGCTGCGCGAGCACCTGCTGAAGTGGGGCTTCACCA 
CCCCCGACAAGAAGCACCAGAAGGAGCCCCCCTTCCTGCCCA^^ 
CTGCTGCCCGAGAAGGACAGCTGGACCGTGAACGACATCC^GAAG^^ 
CCCCGGCATCAAGGTGCGCCAGCTGTGCAAGCTGCTGC^ 

AGGCCGAGCTGGAGCTGGCCGAGAACCGCGAGATCCTGCGCGAGCCCGTGCACGGCGTGTACTACGACCCCAGCAAGGAC 
CTGATCGCCGAGATCCAGAAGCAGGGCCACGAGCAGTGGACCTACCAGATCTACCAGGAGCCCTTCAAGAACCTGAAGAC 
CGGCAAGTACGCCAAGATGCGCACCACCCACACCAACGACGTC 

AGAGCATCGTGATCTGGGGCAAGACCCCCAAGTTCCGCCTGCCCATCCAGAAGGAGACCTGGGAGACCTGGTGGACCGAC 
TACTGGCAGGCCACCTGGATCCCCGAGTGGGAGTTCGTGAACACCCCCCCCCTGGTGAAGCTGTGGTACCAGCTGGAGAA 
GGACCCCATCGCCGGCGTGGAGACCTTCTACGTGGACGGCGCCACCAACCGCGAGGCCAAGATCGGCAAGGCCGGCTACG 
TGACCGACCGCGGCCGCCAGAAGATCGTGACCCTGACCAACACCACCAACCAGAAGACCGAGCTGCAGGCCATCCAGCTG 
GCCCTGC^GGAC^GCGGC^GCGAGGTGAACATCGTGACCGACAGCCAGTACGCCCTGGGCATCATCCAGGCCCAGCCCGA 
CAAGAGCGA<^GCGAGATCTTCAACCAGATC^TCGAGC^GCTGATCAACAAGGAGCGCATCTACCTGAGCTGGGTGCCCG 
CCCACAAGGGCATCGGCGGCAACGAGCAGGTGGACAAGCTGGTGAGCAAGGGCATCCGCAAGGTGCTG 
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>RTmut_TVl_C_ZAwt (SEQ ID NO: 79) 
CCAATTAGTCCTATTGAAACTGTACC^GTAAAAT^ 

AGAAGAAAAAATAAAAGC^TTAACAGCAATTTGTGAGGAAATGGAGAAGGAAGGAAAAATTA 
ATCCATATAACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGA 

AATAAAAGAACTCAAGACTTTTGGGAAGTTCAATTAGGAATACCAC71CCCAGCAGGATTAAAAAAGAAA 
AGTGCTAGATGTGGGGGATGCATATTTTTCAGra 

GTATAAACAATGAAACACCAGGGATTAGATATCAATATAATGTGCTGCCA 

CAGAGTAGCATGACARAAATCTTAGAGCCCTTCAGAGCAAAAAATCCAGACATAGTTATCTATCAAGCCCCGTTGTATGT 
AGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAAGAGTTAAGGGAACATTTATTGAAATGGGGATTTAC^ 
CACCAGACfcAGAAACATCAAAAAGAACCCCCATTTCTTCCCATCGA^ 

CrGCTGCCAGAAAAGGATAGTTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATTTA 
CCCAGGGATTAAAGTAAGGCAACTCTGTAAACTCCTCAGGGGGGCCAAAGCACTAACAGACATAGTACCACTAACTGAAG 
AAGCAGAATTAGAATTGGCAGAGAACAGGGAAATTTTAAGAGAACCAGTACATGGAGTATATTATGATCCATCAAAAGAC 
TTGATAGCTGAAATACAGAAACAGGGGCATGAACAATGGACATATCAAATTTA^ 

AGGGAAGTATGCAAAAATGAGGACTACCCACACTAATGATGTAAAACAGTTAACAGAGGCAGTGCAAAAAATAGCCATGG 
AAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAGACTACCCATCCAAAAAGAAACATGGGAGACATGGTGGACAGAC 
TATTGGCAAGCCACCTGGATCCCTGAGTGGGAGTTTGTTAATACCCCTCCCCTAGTAAAATTATGGTACCAACTAGAAAA 
AGATCCCATAGCAGGAGTAGAAACTTTCTATGTAGATGGAGCAACTAATAGGGAAGCTAAAATAGGAAAAGCAGGGTATG 
TTACTGACAGAGGAAGGCAGAAAATTGTTACTCTAACTAACACAACAAATCAGAAGACTGAGTTACAAGCAATTCAGCTA 
GCTCTGCAGGATTCAGGATCAGAAGTAAACATAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGA 
TAAGAGTGACTCAGAGATATTTAACCAAATAA.TAGAACAGTTAATAAACAAGGAAAGAATCTACCTGTCATGGGTACCAG 
CACATAAAGGAATTGGGGGAAATGAACAAGTAGATAAATTAGTAAGTAAGGGAATTAGGAAAGTGTTG 
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>TatC22Exonl_TVl_C_ZAopt(SEQ ID NO: 80) 

ATGGAGCCCGTGGACCCCAAGCTGRAGCCCTGGAACCACCCCGGCAGCCAGCCC^GACCGCCCMCAAC^CTGCTT^ 
CAAGCACTGCAGCTACCACTGCCTGGTGTGCTTCCAGACCAAGGGCCTGGGCATCAGCTACGGCCGCAAGAAGCGCCGCC 
AGCGCCGCAGCGCCCCCCCCAGCGGCGAGGACCACCAGAACCCCCTGAGCAAGCAG 
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>TatExonl_TVl_C_ZAopt(SEQ ID NO: 81) 

ATGGAGCCCGTGGACCCCAAGCTGAAGCCCTGGAACCACCCCGGCA.GCCAGCCCAAGACCGCCTGCAACAACTGCTTCTC 
O^GCACTGCAGCTACCACTGCCTGGTGTGCTTCCAGACCAAGGGCCTGGGCATCAGCTACGGCCGCAAGAAGCGCCGCC 
AGCGCCGCAGCGCCCCCCCCAGCGGCGAGGACCACCAGAACCCCCTGAGCAAGCAG 
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>TatExonlJTVl_CJ2Awt (SEQ ID NO: 82) 

ATGGAGCCAGTAGATCCTAAACTAAAGCCCTGGAACCATCCAC^ 
CAAACACTGTAGCTATCATTGTCTAGTTTGCTTT 

AGCGACGAAGCGCTCCTCCAAGTGGTGAAGATCATCAAAATCCTCTATCAAAGCAG 
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>TatExon2_TVl_C__ZAopt (SEQ ID NO: 83) 

CCC CTGCCCCAGGCCCG CGGCGACAGCAC CGGCAGCG AGGAGAGCAAG AAGAAGGTGGAGAGCAAGAC CGAGAC CGAC CC 
CTACGACTGGTGA 
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>TatExon2_TVl_C_ZAwt (SEQ ID NO: 84) 

CCCTTACCCCAAGCCCGAGGGGACTCGACAGGCTCGGAGGAATC 
ATACGATTGGTGA 
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>Vif JTVl_C_ZAopt(SEQ ID NO: 85) 

ATGGAGAACCGCTGGCAGGTGCTGATCGTGTGGCAGGTGG^ 

CCACATGTAC^TCAGCCGCCGCGCCAGCGGCTGGGTGTACCGCCAC<^ 

AGGTGCACATCCCCCTGGGCGACGCCCGCCTGGTGATCAAGACCTACTG^ 

CTGGGCCACGGCGTGAGCATCGAGTGGCGCCTGCGCGAGTACAGCAC 

CCAC^TGCACTACTTCGACra 

ACTAC C AGG CCGGC CACAAGAAGGTGGGCAGC CTGCAGTAC CTGG CC CTGAC CG C CCTG ATCAAGCC CAAGAAGCGCAAG 
CCCCCCCTGCCCAGCGTGCGCAAGCTGGTGGAGGACCGCTGGAACGACCCCCAGAAGACCCGCGGCCGCCGCGGCAACCA 
CAC CATG AACGGCCACTAG 
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.f JTVl_C_ZAwt (SEQ ID NO: 86) 

JGAAAACAGATGGCAGGTGCTGATTGTGTGGCAGGTGGACAGGATGAAGATTAGAGCATGGAATAGTTTAGTAAAG^ 
lTATGTATATATC^GGAGAGCTAGTGGATGGGTC^ 

;tacatatcccattaggggatgctagattagtaat^^ 

jggtcatggagtctccatagaatggagactgagagaatacagcacacaagtagaccctgacctggcagaccagctaat 
^catgcattattttgattgttttac^gaatctgccataagacaagccatattaggacacatagt 

i n ATCAAGCAGGACATAAGAAGGTAGGATCTCTGCAATACTTGGCACTGACAGGATTGATAAAACCAAAAAAGAGAAAG 
VCCTCTGCCTAGTGTTAGAAAATTAGTAGAGGATAGATGGAACGACCCCCAGAAGACCAGGGGCCGCAGAGGGAACCA 

:aatgaatggacactag 
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>Vpr_TVl_C_ZAopt (SEQ ID NO: 87) 

ATGGAGCGCCCCCCCGAGGACCAGGGCCCCCAGCGCGAGCCCTACAACGAGTGGACCCTGGAGATCCTGGAGGAGCTGAA 
nrAGGAGGCCGTGCGCCACTTCCCCCGCCCCTGGCTGCACAGCCTGGGCCAGTACATCTACGAGACCTACGGCGACACCT 
GGACCGGCGTGGAGGCCATCATCCGCGTGCTGCAGCAGCTGCTGTTCATCCACTTCCGCATCGGCTGCCAGCACAGCCGC 
ATCGGCATCCTGCGCCAGCGCCGCGCCCGCAACGGCGCCAGCCGCAGC 
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>Vpr_TVl_C_ZAwt (SEQ ID NO: 88) 

ATGGAACGACCCCGAGAAGACCAGGGGCCGCAGAGGGAACCATACAATGAATGG^ 

GCAGGAAGCTGTCAGACACTTTCCTAGACCATGGCTCCATAGCTTAGGACAATATATCTATGAAACCTATGGGGATACTT 

GGACGGGAGTTGAAGCTATAATAAGAGTACTGCAACAACTACTGTTC^ 

ATAGGCATCTTGCGACAGAGAAGAGCAAGAAATGGAGCCAGTAGATCC 



FIGURE 59 
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>Vpu_TVl_C_ZAopt(SEQ ID NO: 89) 

ATGGTGAGCCTGAGCCTGTTCAAGGGCGTGGACTACCGCCTGGGCGTGGGCGCCCTGATCGTGGCCCTGATCATCGCCAT 
rATCGTGTGGACCATCGCCTACATCGAGTACCGCAAGCTGGTGCGCCAGAAGAAGATCGACTGGCTGATCAAGCGCATCC 
^GAGCGCGCCGAGGACAGCGGCAACGAGAGCGACGGCGACACCGAGGAGCTGAGCACCATGGTGGACA 

CGCCTGCTGGACGCCAACGACCTGTAA 
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>VpuJTVl_C_ZAwt (SEQ ID NO: 90) 

ATGGTAAGTTTAAGTTTATTTAAAGGAGTAGATTATAGATT^ 
AATAGTGTGGACCATAGCATATATAGAATATAGGAAATTGGT^ 
GGGAAAGAGCAGAAGACAGTGGCAATGAGAGTGATGGGGAGAC^GAAG 
AGGCTTCTGGATGCTAATGATTTGTAA 
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dna revexonl_2TVl_C_ZAop (SEQ ID NO:91) 

ATGGCCGGCCGCAGCGGCGACAGCGACGAGGCCCTGCTGCAGGTGGTGAAGATCATC 

SSScAGAGCCCCTACCCCAAGCCCG^ 

ACCGCCGCCGCCGCTGGCGCGCCCGCCAGCGCCAGATCCA^ 

rrTGGTGGCCTGCCTGGGCCGCAGCGCCGAGCCCGTGCCCCTGCAGCTGCCCCCCCTG 

gag^g^ogIcacatcaactgcagcgagggcagcggcaccagcggcacccagcagagc 

CAGGGCACCACCGAGGGCGTGGGCGACCCCTAA 
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dna Revexonl_2_TVl_C_ZAwt (SEQ ED NO:92) 



ATGGCAGGAAGAAGCGGAGACAGCGACGAAGCGCTCCTCCAAGTGGTGAAGATCATC 

AAAATCCTCTATCAAAGCAACCCTTACCCCAAGCCCGAGGGGACTCGACAGGCTCGGA 

GGAATCGAAGAAGAAGGTGGAGAGCAAGACAGAGACAGATCCATACGATTGGTGAGC 

GGATTCTTGTCGCTTGCCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTACCACCG 

CTTGAGAGACTTCATATTAATTGCAGTGAGGGCAGTGGAACTTCTGGGACACAGCAGT 

CTCAGGGGACTACAGAGGGGGTGGGAGATCCTTAA 
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dna TatC22Exonl_2_TVl_C_ZAopt (SEQ ID NO:93) 

ATGGAGCCCGTGGACCCCAAGCTGAAGCCCTGGAACCACCCCGGCAGCCAGCCCAAG 

ACCGCCGGCAACAACTGCTTCTGCAAGCACTGCAGCTACCACTGCCTGGTGTGCTrCC 

AGACCAAGGGCCTGGGCATCAGCTACGGCCGCAAGAAGCGCCGCCAGCGCCGCAGCG 

CCCCCCCCAGCGGCGAGGACCACCAGAACCCCCTGAGCAAGCAGCCCCTGCCCCAGGC 

CCGCGGCGACAGCACCGGCAGCGAGGAGAGCAAGAAGAAGGTGGAGAGCAAGACCG 

AGACCGACCCCTACGACTGGTGA 
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dna TatExonl_2_TVl_C_ZAopt (SEQ ID NO:94) 



ATGGAGCCCGTGGACCCCAAGCTGAAGCCCTGGAACCACCCCGGCAGCCAGCCCAAG 

ACCGCCTGCAACAACTGCTTCTGCAAGCACTGCAGCTACCACTGCCTGGTGTGCTTCCA 

GACCAAGGGCCTGGGCATCAGCTACGGCCGCAAGAAGCGCCGCAGCGCCGCAGCGCC 

CCCCCCAGCGGCGAGGACCACCAGAACCCCCTGAGCAAGCAGCCCCTGCCCCAGGCCC 

GCGGCGACAGCACCGGCAGCGAGGAGAGCAAGAAGAAGGTGGAGAGCAAGACCGAG 

ACCGACCCCTACGACTGGTGA 



FIGURE 65 
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dna TatExonl_2_TVl_C_ZAwt (SEQ ID NO:95) 



ATGGAGCCAGTAGATCCTAAACTAAAGCCCTGGAACCATCCAGGAAGCCAACCTAAA 

ACAGCTTGTAATAATTGCTTTTGCAAACACTGTAGCTATCATTGTCT 

GACAAAAGGTTTAGGCATTTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGCGC 

TCCTCCAAGTGGTGAAGATCATCAAAATCCTCTATCAAAGCAGCCCTTACCCCAAGCC 

CGAGGGGACTCGACAGGCTCGGAGGAATCGAAGAAGAAGGTGGAGAGCAAGACAGA 

GACAGATCCATACGATTGGTGA 
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PCT/USO 1/21241 



Ne£D125G-Myr_TVl_C_ZAopt (SEQ ID NO:96) 



ATGGCCGGCAAGTGGAGCAAGCGCAGCATCGTGGGCTGGCCCGCCGTGCGC 

GAGCGCATGCGCCGCACCGAGCCCGCCGCCGAGGGCGTGGGCGCCGCCAGC 

CAGGACCTGGACCGCCACGGCGCCCTGACCAGCAGCAACACCCCCGCCACCA 

ACGAGGCCTGCGCCTGGCTGCAGGCCCAGGAGGAGGACGGCGACGTGGGCT 

TCCCCGTGCGCCCCCAGGTGCCCCTGCGCCCCATGACCTACAAGAGCGCCGT 

GGACCTGAGCTTCTTCCTGAAGGAGAAGGGCGGCCTGGAGGGCCTGATCTAC 

AGCCGCAAGCGCCAGGAGATCCTGGACCTGTGGGTGTACAACACCCAGGGG'E- 

TCTTCCCCGGCTGGCAGAACTACACCAGCGGCCCCGGCGTGCGCTTCCGCCTG 

ACCTTCGGCTGGTGCTTCAAGCTGGTGCCCGTGGACCCCCGCGAGGTGAAGG 

AGGCCAACGAGGGCGAGGACAACTGCCTGCTGCACCCCATGAGCCAGGACG 

GCGCCGAGGACGAGGACCGCGAGGTGCTGAAGTGGAAGTTCGACAGCCTGC 

TGGCCCACCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACTG 

CTGA 
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ATGCGCGCCCGCGGCATCCTGAAGAACTACCGCCACTGGTGGATCTGGGGCATCCT 

GGGCTTCTGGATGCTGATGATGTGCAACGTGAAGGGCCTGTGGGTGACCGTGTACTA 

CGGCGTGCCCGTGGGCCGCGAGGCCAAGACCACCCTGTTCTGCGCCAGCGACGCCA 

AGGCCTACGAGAAGGAGGTGCACAACGTGTGGGCCACCCACGCCTGCGTGCCCACC 

GACCCCAACCCCCAGGAGGTGATCCTGGGCAACGTGACCGAGAACTTCAACATGTG 

GAAGAACGACATGGTGGACCAGATGCAGGAGGACATCATCAGCCTGTGGGACCAGA 

GCCTGAAGCCCTGCGTGAAGCTGACCCCCCTGTGCGTGACCCTGAACTGCACCAACG 

CCACCGTGAACTACAACAACACCAGCAAGGACATGAAGAACTGCAGCTTCTACGTG 

ACCACCGAGCTGCGCGACAAGAAGAAGAAGGAGAACGCCCTGTTCTACCGCCTGGA 

CATCGTGCCCCTGAACAACCGCAAGAACGGCAACATCAACAACTACCGCCTGATCA 

ACTGCAACACCAGCGCCATCACCCAGGCCTGCCCCAAGGTGAGCTTCGACCCCATCC 

CCATCCACTACTGCGCCCCCGCCGGCTACGCCCCCCTGAAGTGCAACAACAAGAAG 

TTCAACGGCATCGGCCCCTGCGACAACGTGAGCACCGTGCAGTGCACCCACGGCAT 

CAAGCCCGTGGTGAGCACCCAGCTGCTGCTGAACGGCAGCCTGGCCGAGGAGGAGA 

TCATCATCCGCAGCGAGAACCTGACCAACAACGTGAAGACCATCATCGTGCACCTG 

AACGAGAGCATCGAGATCAAGTGCACCCGCCCCGGCAACAACACCCGCAAGAGCGT 

GCGCATCGGCCCCGGCCAGGCCTTCTACGCCACCGGCGACATCATCGGCGACATCC 

GCCAGGCCGACTGCAACATCAGCAAGAACGAGTGGAACACCACCCTGCAGCGCGTG 

AGCCAGAAGCTGCAGGAGCTGTTCCCCAACAGCACCGGCATCAAGTTCGCCCCCCA 

CAGCGGCGGCGACCTGGAGATCACCACCCACAGCTTCAACTGCGGCGGCGAGTTCT 

TCTACTGCAACACCACCGACCTGTTCAACAGCACCTACAGCAACGGCACCTGCACCA 

ACGGCACCTGCATGAGCAACAACACCGAGCGCATCACCCTGCAGTGCCGCATCAAG 

CAGATCATCAACATGTGGCAGGAGGTGGGCCGCGCCATGTACGCCCCCCCCATCGC 

CGGCAACATCACCTGCCGCAGCAACATCACCGGCCTGCTGCTGACCCGCGACGGCG 

GCGACAACAACACCGAGACCGAGACCTTCCGCCCCGGCGGCGGCGACATGCGCGAC 

AACTGGCGCAGCGAGCTGTACAAGTACAAGGTGGTGGAGATCAAGCCCCTGGGCGT 

GGCCCCCACCGCCGCCAAGCGCCGCGTGGTGGAGCGCGAGAAGCGCGCCGTGGGCA 

TCGGCGCCGTGTTCCTGGGCTTCCTGGGCGCCGCCGGCAGCACCATGGGCGCCGCCA 

GCATCACCCTGACCGTGCAGGCCCGCCAGCTGCTGAGCGGCATCGTGCAGCAGCAG 

AGCAACCTGCTGCGCGCCATCGAGGCCCAGCAGCACATGCTGCAGCTGACCGTGTG 

GGGCATCAAGCAGCTGCAGGCCCGCGTGCTGGCCATCGAGCGCTACCTGCAGGACC 

AGCAGCTGCTGGGCCTGTGGGGCTGCAGCGGCAAGCTGATCTGCACCACCAACGTG 

CTGTGGAACAGCAGCTGGAGCAACAAGACCCAGAGCGACATCTGGGACAACATGAC 

CTGGATGCAGTGGGACCGCGAGATCAGCAACTACACCAACACCATCTACCGCCTGC 

TGGAGGACAGCCAGAGCCAGCAGGAGCGCAACGAGAAGGACCTGCTGGCCCTGGA 

CCGCTGGAACAACCTGTGGAACTGGTTCAGCATCACCAACTGGCTGTGGTACATCAA 

GATCTTCATCATGATCGTGGGCGGCCTGATCGGCCTGCGCATCATCTTCGCCGTGCT 

GAGCCTGGTGAACCGCGTGCGCCAGGGCTACAGCCCCCTGAGCCTGCAGACCCTGA 

TCCCCAACCCCGGCGGCCCCGACCGCCTGGGCGGCATCGAGGAGGAGGGCGGCGAG 

CAGGACAGCAGCCGCAGCATCCGCCTGGTGAGCGGCTTCCTGACCCTGGCCTGGGA 

CGACCTGCGCAGCCTGTGCCTGTTCTGCTACCACCGCCTGCGCGACTTCATCCTGAT 

CGTGGTGCGCGCCGTGGAGCTGCTGGGCCACAGCAGCCTGCGCGGCCTGCAGCGCG 

GCTGGGGCACCCTGAAGTACCTGGGCAGCCTGGTGCAGTACTGGGGCCTGGAGCTG 

AAGAAGAGCGCCATCAACCTGCTGGACACCATCGCCATCGCCGTGGCCGAGGGCAC 

CGACCGCATCCTGGAGTTCATCCAGAACCTGTGCCGCGGCATCCGCAACGTGCCCCG 

CCGCATCCGCCAGGGCTTCGAGGCCGCCCTGCAGTAA 
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ATGAGAGCGAGGGGGATACTGAAGAATTATCGACACTGGTGGATATGGGGCATCTT 

AGGCTTTTGGATGCTAATGATGTGTAATGTGAAGGGCTTGTGGGTCACAGTCTACTA 

CGGGGTACCTGTGGGGAGAGAAGCAAAAACTACTCTATTTTGTGCATCAGATGCTA 

AAGCATATGAGAAAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACA 

GACCCCAACCCACAAGAAGTGATTTTGGGCAATGTAACAGAAAATTTTAACATGTG 

GAAAAATGACATGGTGGATCAGATGCAGGAAGATATAATCAGTTTATGGGATCAAA 

GCCTTAAGCCATGTGTAAAATTGACCCCACTCTGTGTCACTTTAAACTGTACAAATG 

CAACTGTTAACTACAATAATACCTCTAAAGACATGAAAAATTGCTCTTTCTATGTAA 

CCACAGAATTAAGAGATAAGAAAAAGAAAGAAAATGCACTTTTTTATAGACTTGAT 

ATAGTACCACTTAATAATAGGAAGAATGGGAATATTAACAACTATAGATTAATAAA 

ttgtaatacctcagccataacacaagcctgtccaaaagtctcgtttgacccaattcc 

tatacattattgtgctccagctggttatgcgcctctaaaatgtaataataagaaatt 

caatggaataggAccatgcgataatgtcagcacagtacaatgtacacatggaatta 

agccagtggtatcaactcaattactgttaaatggtagcctagcagaagaagagata 

ataattagatctgaaaatctgacaaacaatgtcaaaacaataatagtacatgttaat. 

gaatctatagagattaaatgtacaagacctggcaataatacaagaaagagtgtgag 

aataggaccaggacaagcattctatgcaacaggagacataataggagatataagac 

aagcacattgtaacattagtaaaaatgaatggaatacaactttacaaagggtaagt 

caaaaattacaagaactcttccctaatagtacagggataaaatttgcaccacactca 

ggaggggacctagaaattactacacatagctttaattgtggaggagaatttttctat 

tgcaatacaacagacctgtttaatagtacatacagtaatggtacatgcactaatggt 

acatgcatgtctaataatacagagcgcatcacactccaatgcagaataaaacaaat 

tataaacatgtggcaggaggtaggacgagcaatgtatgcccctcccattgcaggaa 

acataacatgtagatcaaatattacaggactactattaacacgtgatggaggagat 

aataatactgaaacagagacattcagacctggaggaggagacatgagggacaattg 

gagaagtgaattatataaatacaaggtggtagaaattaaaccattaggagtagcac 

ccactgctgcaaaaaggagagtggtggagagagaaaaaagagcagtaggaatagg 

agctgtgttccttgggttcttgggagcagcaggaagcactatgggcgcagcatcaat 

aacgctgacggtacaggccagacaattattgtctggtatagtgcaacagcaaagta 

atttgctgagggctatagaggcgcaacagcatatgttgcaactcacggtctggggc 

attaagcagctccaggcaagagtcctggctatagagagatacctacaggatcaaca 

gctcctaggactgtggggctgctctggaaaactcatctgcaccactaatgtgctttg 

gaactctagttggagtaataaaactcaaagtgatatttgggataacatgacctggat 

gcagtgggatagggaaattagtaattacacaaacacaatatacaggttgcttgaag 

actcgcaaagccagcaggaaagaaatgaaaaagatttactagcattggacaggtgg 

aacaatctgtggaattggtttagcataacaaattggctgtggtatataaaaatattc 

ataatgatagtaggaggcttgataggtttaagaataatttttgctgtgctctctcta 

gtaaatagagttaggcagggatacrcacccrrgtcattgcagacccttatcccaaac 

ccgaggggacccgacaggctcggaggaatcgaagaagaaggtggagagcaagaca 

gcagcagatccattcgattagtgagcggattcttgacacttgcctgggacgacctac 

gaagccrgtgcctcttctgctaccaccgattgagagacttcatattaattgtagtga 

gagcagtggaacttctgggacacagtagtctcaggggactgcagagggggtgggga 

acccttaagtatttggggagtcttgtgcaatattggggtctagagttaaaaaagagt 

gctattaatctgcttgatactatagcaatagcagtagctgaaggaacagataggatt 

ctagaattcatacaaaacctttgtagaggtatccgcaacgtacctagaagaataag 

acagggcttcgaagcagctttgcaataa 
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Gag_TV2_C_ZAopt (SEQIDNO:99) 

ATGGGCGCCCGCGCCAGCATCCTGCGCGGCGGCAAGCTGGACAAGTGGGAG 

AAGATCCGCCTGCGCCCCGGCGGCCGCAAGCACTACATGCTGAAGCACCTGG 

TGTGGGCCAGCCGCGAGCTGGAGCGCTTCGCCGTGAACCCCGGCCTGCTGGA 

GACCAGCGACGGCTGCCGCCAGATCATCAAGCAGCTGCAGCCCGCCCTGCAG 

ACCGGCACCGAGGAGATCCGCAGCCTGTTCAACACCGTGGCCACCCTGTACT 

GCGTGCACAAGGGCATCGACGTGCGCGACACCAAGGAGGCCCTGGACAAGA 

TCGAGGAGGAGCAGAACAAGTGCCAGCAGAAGACCCAGCAGGCCGAGGCCG 

CCGACAAGAAGGTGAGCCAGAACTACCCCATCGTGCAGAACCTGCAGGGCC 

AGATGGTGCACCAGGCCATCAGCCCCCGCACCCTGAACGCCTGGGTGAAGGT 

GATCGAGGAGAAGGCCTTCAGCCCCGAGGTGATCCCCATGTTCACCGCCCTG 

AGCGAGGGCGCCACCCCCCAGGACCTGAACACCATGCTGAACACCGTGGGC 

GGCCACCAGGCCGCCATGCAGATGCTGAAGGACACCATCAACGAGGAGGCC 

GCCGAGTGGGACCGCCTGCACCCCGTGCACGCCGGCCCCGTGGCCCCCGGCC 

AGATGCGCGAGCCCCGCGGCAGCGACATCGCCGGCACCACCAGCACCCTGCA 

GGAGCAGATCGCCTGGATGACCAGCAACCCCCCCATCCCCGTGGGCGACATC 

TACAAGCGCTGGATCATCCTGGGCCTGAACAAGATCGTGCGCATGTACAGCC 

CCGTGAGCATCCTGGACATCAAGCAGGGCCCCAAGGAGCCCTTCCGCGACTA 

CGTGGACCGCTTCTTCAAGACCCTGCGCGCCGAGCAGAGCACCCAGGAGGTG 

AAGAACTGGATGACCGACACCCTGCTGGTGCAGAACGCCAACCCCGACTGCA 

AGACCATCCTGCGCGCCCTGGGCCCCGGCGCCAGCCTGGAGGAGATGATGAC 

CGCCTGCCAGGGCGTGGGCGGCCCCAGCCACAAGGCCCGCGTGCTGGCCGAG 

GCCATGAGCCAGGCCAACAACACCAGCGTGATGATCCAGAAGAGCAACTTC 

AAGGGCCCCCGCCGCGCCGTGAAGTGCTTCAACTGGGGCCGCGAGGGCCACA 

TCGCCCGCAACTGCCGCGCCCCCCGCAAGCGCGGCTGCTGGAAGTGCGGCAA 

GGAGGGCCACCAGATGAAGGACTGCACCGAGCGCCAGGCCAACTTCCTGGG 

CAAGATCTGGCCCAGCCACAAGGGCCGCCCCGGCAACTTCCTGCAGAGCCGC 

CCCGAGCCCACCGCCCCCCCCCTGGAGCCCACCGCCCCCCCCGCCGAGAGCT 

TCAAGTTCAAGGAGACCCCCAAGCAGGAGCCCAAGGACCGCGAGCGCCTGA 

CCAGCCTGAAGAGCCTGTTCGGCAGCGACCCCCTGAGCCAGTAA . 
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Gag_TV2_C_ZAwt (SEQIDNO:100) 



ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGACAAATGGGAA 

AAAATTAGGTTACGGCCAGGGGGGAGAAAACACTATATGCTAAAACACCTA 

GTATGGGCAAGCAGAGAGCTGGAAAGATTTGCAGTTAACCCTGGCCTTTTAG 

AGACATCAGACGGATGTAGACAAATAATAAAACAGCTACAACCAGCTCTTCA 

GACAGGAACAGAGGAAATTAGATCATTATTTAACACAGTAGCAACTCTCTAT 

TGTGTACATAAAGGGATAGATGTACGAGACACCAAGGAAGCCTTAGACAAG 

ATAGAGGAGGAACAAAACAAATGTCAGCAAAAAACACAGCAGGCGGAAGGG 

GCTGACAAAAAGGTCAGTCAAAATTATCCTATAGTGCAGAACCTCCAAGGGC 

AAATGGTACACCAGGCCATATCACCTAGAACCTTGAATGCATGGGTAAAAGT 

AATAGAGGAGAAGGCTTTTAGCCCAGAGGTAATACCCATGTTTACAGCATTA 

TCAGAAGGAGCCACCCCACAAGATTTAAACACCATGTTAAATACAGTGGGGG 

GACATCAAGCAGCCATGCAAATGTTAAAAGATACCATCAATGAGGAGGCTGC 

AGAATGGGATAGGTTACATCCAGTACATGCAGGGCCTGTTGCACCAGGCCAG 

ATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCTTCAA 

GAACAAATAGCATGGATGACAAGTAACCCACCTATCCCAGTAGGGGACATCT 

ATAAAAGGTGGATAATTCTGGGGTTAAATAAAATAGTAAGAATGTACAGCCC 

TGTCAGCATTTTAGACATAAAACAAGGACCAAAGGAACCCTTTAGAGACTAT 

GTAGACCGGTTCTTCAAAACTTTAAGAGCTGAACAATCTACACAAGAGGTAA 

AAAATTGGATGACAGACACCTTGTTAGTCCAAAATGCGAACCCAGATTGTAA 

GACCATTTTAAGAGCATTAGGACCAGGGGCTTCATTAGAAGAAATGATGACA 

GCATGTCAGGGAGTGGGAGGACCTAGCCACAAAGCAAGAGTTTTGGCTGAG 

GCAATGAGCCAAGCAAACAATACAAGTGTAATGATACAGAAAAGCAATTTTA 

AAGGCCCTAGAAGAGCTGTTAAATGTTTCAACTGTGGCAGGGAAGGGCACAT 

AGCCAGGAATTGCAGGGCCCCTAGGAAAAGGGGCTGTTGGAAATGTGGAAA 

GGAAGGACACCAAATGAAAGACTGTACTGAGAGGCAGGCTAATTTTTTAGGG 

AAAATTTGGCCTTCCCACAAGGGGAGGCCAGGGAATTTCCTTCAGAGCAGAC 

CAGAGCCAACAGCCCCACCACTAGAACCAACAGCCCCACCAGCAGAGAGCT 

TCAAGTTCAAGGAGACTCCGAAGCAGGAGCCGAAAGACAGGGAACCTTTAA 

CTTCCCTCAAATCACTCTTTGGCAGCGACCCCTTGTCTCAATAA 
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Nef_TV2_C_ZAopt (SEQIDNO:101) 



ATGGGCGGCAAGTGGAGCAAGAGCAGCATCATCGGCTGGCCCGAGGTGCGC 

GAGCGCATCCGCCGCACCCGCAGCGCCGCCGAGGGCGTGGGCAGCGCCAGC 

CAGGACCTGGAGAAGCACGGCGCCCTGACCACCAGCAACACCGCCCACAAC 

AACGCCGCCTGCGCCTGGCTGGAGGCCCAGGAGGAGGAGGGCGAGGTGGGC 

TTCCCCGTGCGCCCCCAGGTGCCCCTGCGCCCCATGACCTACAAGGCCGCCAT 

CGACCTGAGCTTCTTCCTGAAGGAGAAGGGCGGCCTGGAGGGCCTGATCTAC 

AGCAAGAAGCGCCAGGAGATCCTGGACCTGTGGGTGTACAACACCCAGGGC = 

TTCTrCCCCGACTGGCAGAACTACACCCCCGGCCCCGGCGTGCGCTTCCCCCT 

GACCTTCGGCTGGTACTTCAAGCTGGAGCCCGTGGACCCCCGCGAGGTGGAG 

GAGGCCAACGAGGGCGAGAACAACTGCCTGCTGCACCCCATGAGCCAGCAC 

GGCATGGAGGACGAGGACCGCGAGGTGCTGCGCTGGAAGTTCGACAGCACC 

CTGGCCCGCCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACT 

GCTGA 
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Nef_TV2_C_ZAwt (SEQ ID NO:102) 

ATGGGGGGCAAGTGGTCAAAAAGCAGTATAATTGGATGGCCTGAAGTAAGA 

GAAAGAATCAGACGAACTAGGTCAGCAGCAGAGGGAGTAGGATCAGCGTCT 

CAAGACTTAGAGAAACATGGGGCACTTACAACCAGCAACACAGCCCACAAC 

AATGCTGCTTGCGCCTGGCTGGAAGCGCAAGAGGAGGAAGGAGAAGTAGGC 

TTTCCAGTCAGACCTCAGGTACCTTTAAGACCAATGACTTATAAAGCAGCAAT 

AGATCTCAGCTTCTITrTAAAAGAAAAGGGGGGACTGGAAGGGTTAATTTAC 

TCCAAGAAAAGGCAAGAGATCCTTGATTTGTGGGTTTATAACACACAAGGCT 

TCTTCCCTGATTGGCAAAACTACACACCGGGACCAGGGGTCAGATTTCdA.CT- 

GACCTTTGGATGGTACTTCAAGCTAGAGCCAGTCGATCCAAGGGAAGTAGAA 

GAGGCCAATGAAGGAGAAAACAACTGTTTACTACACCCTATGAGCCAGCATG. 

GAATGGAGGATGAAGACAGAGAAGTATTAAGATGGAAGTTTGACAGTACGC 

TAGCACGCAGACACATGGCCCGCGAGCTACATCCGGAGTATTACAAAGACTG 

CTGA 
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Pol_TV2_C_ZAopt (SEQIDNO:103) 

TTCTTCCGCGAGAACCTGGCCTTCCCCCAGGGCQAGGCCCQCGAGTTCCCCAGCGAGCAGACC 

CGCGCCAACAGCCCCACCACCCGCACCAACAGCCCCACCAGCCGCGAGCTGCAGGTGCAGGG 

CGACAGCGAGGCCGGCGCCGAGCGCCAGGGCACCTTCAACTTCCCCCAGATCACCCTGTGGC 

AGCGCCCCCTGGTGAGCATCAAGGTGGCCGGCCAGACCAAGGAGGCCCTGCTGGACACCGGC 

GCCGACGACACCGTGCTGGAGGAGATCAACCTGCCCGGCAAGTGGAAGCCCAAGATGATCGG 

CGGCATCGGCGGCITCATCAAGGTGCGCCAGTACGACCAGATCCTGATCGAGATCrGCGGCA 

AGCGCGCCATCGGCACCGTGCTGGTGGGCCCCACCCCCGTGAACATCATCGGCCGCAACCTGC 

TGACCCAGCTGGGCTGCACCCTGAACTTCCCCATCAGCCCCATCGAGACCGTGCCCGTGAAGC 

TGAAGCCCGGCATGGACGGCCCCAAGGTGAAGCAGTGGCCCCTGACCGAGGAGAAGATCAAG 

GCCCTGACCGAGATCTGCGAGGAGATGGAGAAGGAGGGCAAGATCACCAAGATCGGCCCCG~ 

AGAACCCCTACAACACCCCCGTGTTCGCCATCAAGAAGAAGGACAGCACCAAGTGGCGCAAG 

CrGGTGGACTTCCGCGAGCTGAACAAGCGCACCCAGGACTTCTGGGAGGTGCAGCTGGGCAT 

CCCCCACCCCGCCGGCCTGAAGAAGAAGAAGAGCGTGACCGTGCTGGACGTGGGCGACGCCT 

ACTTCAGCGTGCCCCTGGACGAGAGCTTCCGCAAGTACACCGCCTTCAGCATCCCCAGCATCA 

ACAACGAGACCCCCGGCATCCGCTACCAGTACAACGTGCTGCCCCAGGGCTGGAAGGGCAGC 

CCCGCCATCTTCCAGAGCAGCATGACCCGCATCCTGGAGCCCTTCCGCACCCAGAACCCCGAG 

GTGGTGATCTACCAGTACATGGACGACCTGTACGTGGGCAGCGACCTGGAGATCGGCCAGCA 

CCGCGCCAAGATCGAGGAGCTGCGCGGCCACCTGCTGAAGTGGGGCTTCACCACCCCCGACA 

AGAAGCACCAGAAGGAGCCCCCCTTCCTGTGGATGGGCTACGAGCTGCACCCCGACAAGTGG 

ACCGTGCAGCCCATCCAGCTGCCCGAGAAGGAGAGCTGGACCGTGAACGACATCCAGAAGCT 

GGTGGGCAAGCTGAACTGGGCCAGCCAGATCTACCCCGGCATCAAGGTGCGCCAGCTGTGCA 

AGCTGCTGCGCGGCGCCAAGGCCCTGACCGACATCGTGCCCCTGACCGAGGAGGCCGAGCTG 

GAGCTGGCCGAGAACCGCGAGATCCTGAAGGAGCCCGTGCACGGCGTGTACTACGACCCCAG 

CAAGGACCTGATCGCCGAGATCCAGAAGCAGGGCAACGACGA.GTGGACCTACCAGATCTACC 

AGGAGCCCTTCAAGAACCTGCGCACCGGCAAGTACGCCAAGATGCGCACCGCCCACACCAAC 

GACGTGAAGCAGCTGGCCGAGGCCGTGCAGAAGATCACCCAGGAGAGCATCGTGATCTGGGG 

CAAGACCCCCAAGTTCCGCCTGCCCATCCCCAAGGAGACCTGGGAGACCTGGTGGAGCGACT 

ACTGGCAGGCCACCTGGATCCCCGAGTGGGAGTTCGTGAACACCCCCCCGCTGGTGAAGCTGT 

GGTACCAGCTGGAGAAGGAGCCCATCGTGGGCGCCGAGACCTTCTACGTGGACGGCGCCGCC 

AACCGCGAGACCAAGATCGGCAAGGCCGGCTACGTGACCGACAAGGGCCGCCAGAAGGTGG 

TGAGCTTCACCGAGACCACCAACCAGAAGACCGAGCTGCAGGCCATCCAGCTGGCCCTGCAG 

GACAGCGGCCCCGAGGTGAACATCGTGACCGACAGCCAGTACGCCCTGGGCATCATCCAGGC 

CCAGCCCGACAAGAGCGAGAGCGAGCTGGTGAGCCAGATCATCGAGCAGCTGATCAAGAAG 

GAGAAGGTGTACCTGAGCTGGGTGCCCGCCCACAAGGGCATCGGCGGCAACGAGCAGGTGGA 

CAAGCTGGTGAGCAGCGGCATCCGCAAGGTGCTGTTCCTGGACGGCATCGACAAGGCCCAGG 

aggagcacgagaagtaccacagcaactggcgcgccatggccagcgagttcaacctgcccccc 

atcgtggccaaggagatcgtggccagctgcgacaagtgcx:agctgaagggcgaggccatgca 

cggccaggtggactgcagccccggcatctggcagctggactgcacccacctggagggcaaga 

tcatcctggtggccgtgcacgtggccagcggctacatggaggccgaggtgatccccgccgag 

accggccaggagaccgcctacrtcatcctgaagctggccggccgctggcccgtgaaggtgatc 

cacaccgacaacggcagcaacttcaccagcaccgccgtgaaggccgcctgctggtgggccga 

CATCCAGCGCGAGTTCGGCATCCCCTACAACCCCCAGAGCCAGGGCGTGGTGGAGAGCATGA 

acaaggagctgaagaagatcatcggccaggtgcgcgaccaggccgagcacctgaagaccgcc 

gtgcagatggccgtgttcatccacaacttcaagcgcaagggcggcatcggcggctacagcgc 

cggcgagcgcatcatcgacatcatcgccagcgacatccagaccaaggagctgcagaagcaga 

tcatcaagatccagaacttccgcgtgtactaccgcgacagccgcgaccccatctggaagggcc 

ccgccaagctgctgtggaagggcgagggcgccgtggtgatccaggacaacagcgacatcaag 

gtggtgccccgccgcaaggccaagatcatcaaggactacggcaagcagatggccggcgccga 

CTGCGTGGCCGGCCGCCAGGACGAGGAC 
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Pol_TV2_C_ZAwt (SEQ ID NO:104) 

ttttttagggaaaatttggccntcccac 

agagccaacagccccaccactagaaccaacagccccaccagcagagagcttcaagttcaagg 

agactccgaagcaggagccgaaagacagggaacctttaacitccctc aaat cac^ 

gcgaccccttgtctcaataaaagtagcgggccaaacaaaggaggctcit^ 

cagatgatacagtactagaagaaataaacntgccaggaaaatggaaaccaaaaatgatagg 

aggaattggaggttttatcaaagtaagacagt^^ 

aagggctataggtacagtattagtaggacc^^ 

gactcagcitggatgcacactaaattttc 

aaagccaggaatggatggcccaaaggttaaacaatggccattgacagaagaaaaaataaaa. 
gcattaacagaaatttgtgaggaaatggaga^ 

aaatccatataacactccagtatttgccataaagaagaa^^cagtacaaagtggagaaaat 

tagtagatttcagggaactcaataaaagaactcaagacttttgggaagtc^ 

ccacacccagcagggttaaaaaagaaaaaatcagtgacagtactggatgtgggagatgcata 

TTTTTCAGTCCCTTTAGATGAGAGCTTCAGAAAATATACTGCATTCACCAT^ 
AATGAAACACCAGGGATTAGATATCAATATAATGTTCn-CCACAGGGATGGAAAGGATCACC 
AGCAATATTCCAGAGTAGCATGACAAGAATCTTAGAGCCCTTTAGAACACAAAACCCAGAAG 
TAGTTATCTATCAATATATGGATGACTTATATG^^ 

GAGCAAAAATAGAGGAGTTAAGAGGACACCTATTGAAATGGGGATTTACCACACCAGACAAG 
AAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTCCATCCTGACAAA 
GTACAGCCTATACAGCTGCCAGAAAAGGAGAGCTGGACTGTCAATGATATACAGAAGTTAGT 
GGGAAAGTTAAACTGGGCAAGTCAGATTTACCCAGGGATO 

TCCTTAGGGGAGCCAAAGCACTAACAGACATAGTGCCACTGACTGAAGAAGCAGAATTAGAA 

TTGGCTGAGAACAGGGAAATTCTAAAAGAACCAGTACATGGAGTATATTATGACCCATCAAA 

AGATTTAATAGCTGAAATACAGAAACAGGGGAATGACCAATGGACATATCAAATTTACCAAG 

AACCATTTAAAAATCTGAGAACAGGAAAGTATGCAAAAATGAGGACTG^ 

GTGAAACAGTTAGCAGAGGCAGTGCAAAAGATAACCCAGGAAAGCATAGTAATATGGGGAA 

AAACTCCTAAATTTAGACTACCCATCCCAAuAAGAAACATGGGAGACATGGTGGTCAGACTATT 

GGCAAGCGACCTGGATTCCTGAGTGGGAGTTTGTCAATACCC 

ACCAGCTGGAAAAAGAACCCATAGTAGGGGCAGAAACTTTCTATGTAGATGGAGCAGCCAAT 
AGGGAAACTAAAATAGGAAAAGCAGGGTATGTCACT^ 

CrrCACTGAAACAACAAATCAGAAGACTGAATTACAAGCAATTCAGCTAGCm 

AGGGCCAGAAGTAAACATAGTAACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAAC 

CAGATAAGAGTGAATCAGAATTAGTCAGTCAAATAATAGAACAGTTGATAAAAAAGGAAAAA 

GTCTACCTATCATGGGTACCAGCACATAAAGGAATTGGAGGAAATGAACAAGTAGACAA^ 

AGTAAGTAGTGGAATCAGAAAAGTACTGTTTCTAGATGGAATAGATAAAGCTCAAGAAGAGC 

ATGAAAAATATCACAGCAATTGGAGAGCAATGGCTAGTGAGTTTAATCTGCCACCCATAGTA 

GCAAAGGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGGGAAGCCATGCATGGACA 

AGTCGACTGTAGTCCAGGAATATGGCAATTAGACTGTACACATTTAGAAGGAAAAATCATCCT 

AGTAGCAGTCCATGTAGCCAGTGGCTACATGGAAGCAGAGGTTATCCCAGCAGAAACAGGAC 

AAGAAACAGCATACTTTATACTAAAATTAGCAGGAAGATGGCCAGTCAAAGTAATACATACA 

GATAATGGCAGTAATTTCACCAGTACCGCAGTTAAGGCAGCCTGTTGGTGGGCAGATATCCAA 

CGGGAATTTGGAATTCCCTACAATCCCCAAAGTCAAGGAGTAGTAGAATCCATC 

ATTAAAG AAAATCATAGGGCAAGTAAG AG ATCAAGCTGAG . 

TGGCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTA 

AGAATAATAGACATAATAGCATCAGAC^TACAAACTAAAGAATTACAAAAACAAATTATA^ 

AATTCAAAATTTTCGGGTTTATTACAGAGACAGCAGAGACCCTATITGGAAA^ 

ACTACTCTGGAAAGGTGAAGGGGCAGTAGTAATACAAGATAATAGTGATATAAAGGTAGTAC 

G\AGAAGGAAAGCAAAAATCATTAAGGACTATGGAAAACAGATGGCAGGTGCT 

GCAGGTAGACAGGATGAAGAT 
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RevExonl_TV2_C_ZAopt (SEQ ED NO: 105) 

ATGGCCGGCCGCAGCGGCGACAGCGACGAGGCCCTGCTGCAGGCCATCAAG 
ATCATCAAGATCCTGTACCAGAGC 

FIGURE 76 



WO 02/04493 . PCTAJS01/21241 

84/114 

RevExonl_TV2_C_ZAwt (SEQIDNO:106) 

ATGGCAGGAAGAAGCGGAGACAGCGACGAAGCGCTCCTCCAAGCAATAAAG 
ATCATCAAGATCCTCTACCAAAGCA 
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^ : is the regions for p-sheet deletions 

*: is the N-linked glycosylation sites for subtype C TV1 and TV2. Possible mutation (N-> Q) or 
deletions can be performed. 
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NefD125G-Myr_TV2_C_ZAopt (SEQ ID NO:135) 



ATGGCCGGCAAGTGGAGCAAGAGCAGCATCATCGGCTGGCCCGAGGTGCGC 

GAGCGCATCCGCCGCACCCGCAGCGCCGCCGAGGGCGTGGGCAGCGCCAGC 

CAGGACCTGGAGAAGCACGGCGCCCTGACCACCAGCAACACCGCCCACAAC 

AACGCCGCCTGCGCCTGGCTGGAGGCCCAGGAGGAGGAGGGCGAGGTGGGC 

TTCCCCGTGCGCCCCCAGGTGCCCCTGCGCCCCATGACCTACAAGGCCGCCAT 

CGACCTGAGCTTCTTCCTGAAGGAGAAGGGCGGCCTGGAGGGCCTGATqJTAC, 

AGCAAGAAGCGCCAGGAGATCCTGGACCTGTGGGTGTACAACACCCAGGGC^ 

TTCTTCCCCGGCTGGCAGAACTACACCCCCGGCCCCGGCGTGCGCTTCCCCCT 

GACCTTCGGCTGGTACTTCAAGCTGGAGCCCGTGGACCCCCGCGAGGTGGAG 

GAGGCCAACGAGGGCGAGAACAACTGCCTGCTGCACCCCATGAGCCAGCAC 

GGCATGGAGGACGAGGACCGCGAGGTGCTGCGCTGGAAGTTCGACAGCACC 

CTGGCCCGCCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACT 

GCTGA 
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NefD125G_TV2_C_ZAopt (SEQ ID NO:134) 



ATGGGCGGCAAGTGGAGCAAGAGCAGCATCATCGGCTGGCCCGAGGTGCGC 

GAGCGCATCCGCCGCACCCGCAGCGCCGCCGAGGGCGTGGGCAGCGCCAGC 

CAGGACCTGGAGAAGCACGGCGCCCTGACCACCAGCAACACCGCCCACAAC 

AACGCCGCCTGCGCCTGK3CTGGAGGCCCAGGAGGAGGAGGGCGAGGTGGGC 

TTCCCCGTGCGCCCCCAGGTGCCCCTGCGCCCCATGACCTACAAGGCCGCCAT 

CGACCTGAGCTTCTrCCTGAAGGAGAAGGGCGGCCTGGAGGGCCTGATCpTAC^ 

AGCAAGAAGCGCCAGGAGATCCTGGACCTGTGGGTGTACAACACCCAGGGG- 

TTCTTCCCCGGCTGGCAGAACTACACCCCCGGCCCCGGCGTGCGCTTCCCCCT 

GACCTTCGGCTGGTACTTCAAGCTGGAGCCCGTGGACCCCCGCGAGGTGGAG 

GAGGCCAACGAGGGCGAGAACAACTGCCTGCTGCACCCCATGAGCCAGCAC 

GGCATGGAGGACGAGGACCGCGAGGTGCTGCGCTGGAAGTTCGACAGCACC 

CTGGCCCGCCGCCACATGGCCCGCGAGCTGCACCCCGAGTACTACAAGGACT 

GCTGA 
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gpWOmoiTVl.wtLnative (SEQID NO:133) 

1 gaattcatga gagtgatggg gacacagaag aattgtcaac aatggtggat atggggcatc 
61 ttaggcttct ggatgctaat gatttgtaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccaccacc gagctgcgcg acaagaagca caaggagtac 
541 gccctgttct accgcctgga catcgtgccc ctgaacgaga acagcgacaa cttcacctac 
601 cgcctgatca actgcaacac cagcaccatc acccaggcct gccccaaggt gagcttcgac 
661 cccatcccca tccactactg cgcccccgcc ggctacgcca tcctgaagtg caacaacaag 
721 accttcaacg gcaccggccc ctgctacaac gtgagcaccg tgcagtgcac ccacggcatc 
781 aagcccgtgg tgagcaccca gctgctgctg aacggcagcc tggccgagga gggcatcatc 
841 atccgcagcg agaacctgac cgagaacacc aagaccatca tcgtgcacct gaacgagagc 
901 gtggagatca actgcacccg ccccaacaac aacacccgca agagcgtgcg catcggcccc 
961 ggccaggcct tctacgccac caacgacgtg atcggcaaca tccgccaggc ccactgcaac 
1021 atcagcaccg accgctggaa caagaccctg cagcaggtga tgaagaagct gggcgagcac 
1081 ttccccaaca agaccatcca gttcaagccc cacgccggcg gcgacctgga gatcaccatg 
1141 cacagcttca actgccgcgg cgagttcttc tactgcaaca ccagcaacct gttcaacagc 
1201 acctaccaca gcaacaacgg cacctacaag tacaacggca acagcagcag ccccatcacc 
1261 ctgcagtgca agatcaagca gatcgtgcgc atgtggcagg gcgtgggcca ggccacctac 
1321 gccccccccatcgccggcaacatcacctgccgcagcaacatcaccggcatcctgctgacc 
1381 cgcgacggcg gcttcaacac caccaacaac accgagacct tccgccccgg cggcggcgac 
1441 atgcgcgaca actggcgcag cgagctgtac aagtacaagg tggtggagat caagcccctg 
1501 ggcatcgccc ccaccaaggc caagcgccgc gtggtgcagc gcgagaagcg cgccgtgggc 
1561 atcggcgccg tgttcctggg cttcctgggc gccgccggca gcaccatggg cgccgccagc 
1621 atcaccctga ccgtgcaggc ccgccagctg ctgagcggca tcgtgcagca gcagagcaac 
1681 ctgctgaagg ccatcgaggc ccagcagcac atgctgcagc tgaccgtgtg gggcatcaag 
1741 cagctgcagg cccgcgtgct ggccatcgag cgctacctga aggaccagca gctgctgggc 
1801 atctggggct gcagcggccg cctgatctgc accaccgccg tgccctggaa cagcagctgg 
1861 agcaacaaga gcgagaagga catctgggac aacatgacct ggatgcagtg ggaccgcgag 
1921 atcagcaact acaccggcct gatctacaac ctgctggagg acagccagaa ccagcaggag 
1981 aagaacgaga aggacctgct ggagctggac aagtggaaca acctgtggaa ctggttcgac 
2041 atcagcaact ggccctggta catctaactc gag 
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gpHOmocLTVl (SEQIDNO:132) 

1 gaattdatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccaccacc gagctgcgcg acaagaagca caaggagtac 
541 gccctgttct accgcctgga catcgtgccc ctgaacgaga acagcgacaa cttcacctac 
601 cgcctgatca actgcaacac cagcaccatc acccaggcct gccccaaggt gagcttcgac 
661 cccatcccca tccactactg cgcccccgcc ggctacgcca tcctgaagtg caacaacaag 
721 accttcaacg gcaccggccc ctgctacaac gtgagcaccg tgcagtgcac ccacggcatc 
781 aagcccgtgg tgagcaccca gctgctgctg aacggcagcc tggccgagga gggcatcatc 
841 atccgcagcg agaacctgac cgagaacacc aagaccatca tcgtgcacct gaacgagagc 
901 gtggagatca actgcacccg ccccaacaac aacacccgca agagcgtgcg catcggcccc 
961 ggccaggcct tctacgccac caacgacgtg atcggcaaca tccgccaggc ccactgcaac 
1021 atcagcaccg accgctggaa caagaccctg cagcaggtga tgaagaagct gggcgagcac 
1081 ttccccaaca agaccatcca gttcaagccc cacgccggcg gcgacctgga gatcaccatg 
1141 cacagcttca actgccgcgg cgagttcttc tactgcaaca ccagcaacct gttcaacagc 
1201 acctaccaca gcaacaacgg cacctacaag tacaacggca acagcagcag ccccatcacc 
1261 ctgcagtgca agatcaagca gatcgtgcgc atgtggcagg gcgtgggcca ggccacctac 
1321 gcccccccca tcgccggcaa catcacctgc cgcagcaaca tcaccggcat cctgctgacc 
1381 cgcgacggcg gcttcaacac caccaacaac accgagacct tccgccccgg cggcggcgac 
1441 atgcgcgaca actggcgcag cgagctgtac aagtacaagg tggtggagat caagcccctg 
1501 ggcatcgccc ccaccaaggc caagcgccgc gtggtgcagc gcgagaagcg cgccgtgggc 
1561 atcggcgccg tgttcctggg cttcctgggc gccgccggca gcaccatggg cgccgccagc 
1621 atcaccctga ccgtgcaggc ccgccagctg ctgagcggca tcgtgcagca gcagagcaac 
1681 ctgctgaagg ccatcgaggc ccagcagcac atgctgcagc tgaccgtgtg gggcatcaag 
1741 cagctgcagg cccgcgtgct ggccatcgag cgctacctga aggaccagca gctgctgggc 
1801 atctggggct gcagcggccg cctgatctgc accaccgccg tgccctggaa cagcagctgg 
1861 agcaacaaga gcgagaagga catctgggac aacatgacct ggatgcagtg ggaccgcgag 
1921 atcagcaact acaccggcct gatctacaac ctgctggagg acagccagaa ccagcaggag 
1981 aagaacgaga aggacctgct ggagctggac aagtggaaca acctgtggaa ctggttcgac 
2041 atcagcaact ggccctggta catdtaactc gag 
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gpHOmocLTVLtpal (SEQE)NO:131) 

1 atggatgcaa tgaagagagg gctctgctgt gtgctgctgc tgtgtggagc agtcttcgtt 
61 tcgcccagcg ccagcaccga ggacctgtgg gtgaccgtgt actacggcgt gcccgtgtgg 
121 cgcgacgcca agaccaccct gttctgcgcc agcgacgcca aggcctacga gaccgaggtg 
181 cacaacgtgt gggccaccca cgcctgcgtg cccaccgacc ccaaccccca ggagatcgtg 
241 ctgggcaacg tgaccgagaa cttcaacatg tggaagaacg acatggccga ccagatgcac 
301 gaggacgtga tcagcctgtg ggaccagagc ctgaagccct gcgtgaagct gacccccctg 
361 tgcgtgaccc tgaactgcac cgacaccaac gtgaccggca accgcaccgt gaccggcaac 
421 agcaccaaca acaccaacgg caccggcatc tacaacatcg aggagatgaa gaactgcagc 
481 ttcaacgcca ccaccgagct gcgcgacaag aagcacaagg agtacgccct gttctaccgc 
541 ctggacatcg tgcccctgaa cgagaacagc gacaacttca cctaccgcct gatcaactgc 
601 aacaccagca ccatcaccca ggcctgcccc aaggtgagct tcgaccccat ccccatccac 
661 tactgcgccc ccgccggcta cgccatcctg aagtgcaaca acaagacctt caacggcacc 
721 ggcccctgct acaacgtgag caccgtgcag tgcacccacg gcatcaagcc cgtggtgagc 
781 acccagctgc tgctgaacgg cagcctggcc gaggagggca tcatcatccg cagcgagaac 
841 ctgaccgaga acaccaagac catcatcgtg cacctgaacg agagcgtgga gatcaactgc 
901 acccgcccca acaacaacac ccgcaagagc gtgcgcatcg gccccggcca ggccttctac 
961 gccaccaacg acgtgatcgg caacatccgc caggcccact gcaacatcag caccgaccgc 
1021 tggaacaaga ccctgcagca ggtgatgaag aagctgggcg agcacttccc caacaagacc 
1081 atccagttca agccccacgc cggcggcgac ctggagatca ccatgcacag cttcaactgc 
1 141 cgcggcgagt tcttctactg caacaccagc aacctgttc^ acagcaccta ccacagcaac 
1201 aacggcacct acaagtacaa cggcaacagc agcagcccca tcaccctgca gtgcaagatc 
1261 aagcagatcg tgcgcatgtg gcagggcgtg ggccaggcca cctacgcccc ccccatcgcc 
1321 ggcaacatca cctgccgcag caacatcacc ggcatcctgc tgacccgcga cggcggcttc 
1381 aacaccacca acaacaccga gaccttccgc cccggcggcg gcgacatgcg cgacaactgg 
1441 cgcagcgagc tgtacaagta caaggtggtg gagatcaagc ccctgggcat cgcccccacc 
1501 aaggccaagc gccgcgtggt gcagcgcgag aagcgcgccg tgggcatcgg cgccgtgttc 
1561 ctgggcttcc tgggcgccgc cggcagcacc atgggcgccg ccagcatcac cctgaccgtg 
1621 caggcccgcc agctgctgag cggcatcgtg cagcagcaga gcaacctgct gaaggccatc 
1681 gaggcccagc agcacatgct gcagctgacc gtgtggggca tcaagcagct gcaggcccgc 
1741 gtgctggcca tcgagcgcta cctgaaggac cagcagctgc tgggcatctg gggctgcagc 
1801 ggccgcctga tctgcaccac cgccgtgccc tggaacagca gctggagcaa caagagcgag 
1861 aaggacatct gggacaacat gacctggatg cagtgggacc gcgagatcag caactacacc 
1921 ggcctgatct acaacctgct ggaggacagc cagaaccagc aggagaagaa cgagaaggac 
1981 ctgctggagc tggacaagtg gaacaacctg tggaactggt tcgacatcag caactggccc 
2041 tggtacatct aa 
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1 atgagagtga tggggacaca gaagaattgt caacaatggt ggatatgggg catcttaggc 
61 ttctggatgc taatgatttg taacacggag gacttgtggg tcacagtcta ctatggggta 
121 cctgtgtgga gagacgcaaa aactactcta ttctgtgcat cagatgctaa agcatatgag 
181 acagaagtgc ataatgtctg ggctacacat gcctgtgtac ccacagaccc caacccacaa 
241 gaaatagttt tgggaaatgt aacagaaaat tttaatatgt gga aaa a t ga catggcagat 
301 cagatgcatg aggatgtaat cagtttatgg gatcaaagcc taaagccatg tgtaaagttg 
361 accccactct gtgtcacttt aaactgtaca gatacaaatg ttacaggtaa tagaactgtt 
421 acaggtaata gtaccaataa tacaaatggt acaggtattt ataacattga agaaatgaaa 
481 aattgctctt tcaatgcaac cacagaatta agagataaga aacataaaga gtatgcactc 
541 ttttatagac ttgatatagt accacttaat gagaatagtg acaactttac atatagatta 
601 ataaattgca atacctcaac cataacacaa gcctgtccaa aggtctcttt tgacccgatt 
661 cctatacatt actgtgctcc agctggttat gcgattctaa agtgtaataa taagacattc 
721 aatgggacag gaccatgtta taatgtcagc acagtacaat gtacacatgg aattaagcca 
781 gtggtatcaa ctcaattact gttaaatggt agtctagcag aagaagggat aataattaga 
841 tctgaaaatt tgacagagaa taccaaaaca ataatagtac accttaatga atctgtagag 
901 attaattgta caagacccaa caataataca agaaaaagtg taaggatagg accaggacaa 
961 gcattctatg caacaaatga tgtaatagga aacataagac aagcacattg taacattagt 
1021 acagatagat ggaacaaaac tttacaacag gtaatgaaaa aattaggaga gcatttccct 
1081 aataaaacaa tacaatttaa accacatgca ggaggggatc tagaaattac aatgcatagc 
1141 tttaattgta gaggagaatt tttctattgt aatacatcaa acctgtttaa tagcacatac 
1201 cactctaata atggtacata caaatacaat ggtaattcaa gctcacccat cacactccaa 
1261 tgtaaaataa aacaaattgt acgcatgtgg caaggggtag gacaagcaac gtatgcccct 
1321 cccattgcag gaaacataac atgtagatca aacatcacag gaatactatt gacacgtgat 
1381 ggaggattta acaccacaaa caacacagag acattcagac ctggaggagg agatatgagg 
1441 gataactgga gaagtgaatt atataaatat aaagtagtag aaattaagcc attgggaata 
1501 gcacccacta aggcaaaaag aagagtggtg cagagagaaa aaagagcagt gggaatagga 
1561 gctgtgttcc ttgggttctt gggagcagca ggaagcacta tgggcgcagc gtcaataacg 
1621 ctgacggtac aggccagaca actgttgtct ggtatagtgc aacagcaaag caatttgctg 
1681 aaggctatag aggcgcaaca gcatatgttg caactcacag tctggggcat taagcagctc 
1741 caggcgagag tcctggctat agaaagatac ctaaaggatc aacagctcct agggatttgg 
1801 ggctgctctg gaagactcat ctgcaccact gctgtgcctt ggaactccag ttggagtaat 
1861 aaatctgaaa aagatatttg ggataacatg acttggatgc agtgggatag agaaattagt 
1921 aattacacag gcttaatata caatttgctt gaagactcgc aaaaccagca ggaaaagaat 
1981 gaaaaagatt tattagaatt ggacaagtgg aacaatctgt ggaattggtt tgacatatca 
2041 aactggccgt ggtatataaa aatattcata atgatagtag gaggcttgat aggtttaaga 
2101 ataatttttg ctgtgctttc tatagtgaat agagttaggc agggatactc acctttgtca 
2161 tttcagaccc ttaccccaag cccgagggga ctcgacaggc tcggaggaat cgaagaagaa 
2221 ggtggagagc aagacagaga cagatccata cgattggtga gcggattctt gtcgcttgcc 
2281 tgggacgatc tgcggaacct gtgcctcttc agctaccacc gcttgagaga cttcatatta 
2341 attgcagtga gggcagtgga acttctggga cacagcagtc tcaggggact acagaggggg 
2401 tgggaaatcc ttaagtatct gggaagtctt gtgcaatatt ggggtctaga gctaaaaaag 
2461 agtgctatta gtctgcttga taccatagca ataacagtag ctgaaggaac agataggatt 
2521 atagaattag tacaaagaat ttgtagagct atcctcaaca tacctagaag aataagacag 
2581 ggctttgaag cagctttgct ataa 
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1 gaattcatga gagtgatggg gacacagaag aattgtcaac aatggtggat atggggcatc 
61 ttaggcttct ggatgctaat gatttgtaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccaccacc gagctgcgcg acaagaagca caaggagtac 
541 gccctgttct accgcctgga catcgtgccc ctgaacgaga acagcgacaa cttcacctac- 
601 cgcctgatca actgcaacac cagcaccatc acccaggcct gccccaaggt gagcttcgac 
661 cccatcccca tccactactg cgcccccgcc ggctacgcca toctgaagtg caacaacaag 
721 accttcaacg gcaccggccc ctgctacaac gtgagcaccg tgcagtgcac ccacggcatc 
781 aagcccgtgg tgagcaccca gctgctgctg aacggcagcc tggccgagga gggcatcatc 
841 atccgcagcg agaacctgac cgagaacacc aagaccatca tcgtgcacct gaacgagagc 
901 gtggagatca actgcacccg ccccaacaac aacacccgca agagcgtgcg catcggcccc 
961 ggccaggcct tctacgccac caacgacgtg atcggcaaca tccgccaggc ccactgcaac 
1021 atcagcaccg accgctggaa caagaccctg cagcaggtga tgaagaagct gggcgagcac 
1081 ttccccaaca agaccatcca gttcaagccc cacgccggcg gcgacctgga gatcaccatg 
1 141 cacagcttca actgccgcgg cgagttcttc tactgcaaca ccagcaacct gttcaacagc 
1201 acctaccaca gcaacaacgg cacctacaag tacaacggca acagcagcag ccccatcacc 
1261 ctgcagtgca agatcaagca gatcgtgcgc atgtggcagg gcgtgggcca ggccacctac 
1321 gcccccccca tcgccggcaa catcacctgc cgcagcaaca tcaccggcat cctgctgacc 
1381 cgcgacggcg gcttcaacac caccaacaac accgagacct tccgccccgg cggcggcgac 
1441 atgcgcgaca actggcgcag cgagctgtac aagtacaagg tggtggagat caagcccctg 
1501 ggcatcgccc ccaccaaggc caagcgccgc gtggtgcagc gcgagaagcg cgccgtgggc 
1561 atcggcgccg tgttcctggg cttcctgggc gccgccggca gcaccatggg cgccgccagc 
1621 atcaccctga ccgtgcaggc ccgccagctg ctgagcggca tcgtgcagca gcagagcaac 
1681 ctgctgaagg ccatcgaggc ccagcagcac atgctgcagc tgaccgtgtg gggcatcaag 
1741 cagctgcagg cccgcgtgct ggccatcgag cgctacctga aggaccagca gctgctgggc 
1801 atctggggct gcagcggccg cctgatctgc accaccgccg tgccctggaa cagcagctgg 
1861 agcaacaaga gcgagaagga catctgggac aacatgacct ggatgcagtg ggaccgcgag 
1921 atcagcaact acaccggcct gatctacaac ctgctggagg acagccagaa ccagcaggag 
1981 aagaacgaga aggacctgct ggagctggac aagtggaaca acctgtggaa ctggttcgac 
2041- atcagcaact ggccctggta catcaagatc ttcatcatga tcgtgggcgg cctgatcggc 
2101 ctgcgcatca tcttcgccgt gctgagcatc gtgaaccgcg tgcgccaggg ctacagcccc 
2161ctgagcttccagaccctgacccccagcccccgcggcctggaccgcctgggcggcatcgag 

2221 gaggagggcg gcgagcagga ccgcgaccgc agcatccgcc tggtgagcgg cttcctgagc 
2281 ctggcctggg acgacctgcg caacctgtgc ctgttcagct accaccgcct gcgcgacttc 
2341 atcctgatcg ccgtgcgcgc cgtggagctg ctgggccaca gcagcctgcg cggcctgcag 
2401 cgcggctggg agatcctgaa gtacctgggc agcctggtgc agtactgggg cctggagctg 
2461 aagaagagcg ccatcagcct gctggacacc atcgccatca ccgtggccga gggcaccgac 
2521 cgcatcatcg agctggtgca gcgcatctgc cgcgccatcc tgaacatccc ccgccgcatc 
2581 cgccagggcttcgaggccgc cctgctgtaa ctcgag 
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1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccaccacc gagctgcgcg acaagaagca caaggagtac 
541 gccctgttct accgcctgga catcgtgccc ctgaacgaga acagcgacaa cttcacctac 
601 cgcctgatca actgcaacac cagcaccatc acccaggcct gccccaaggt gagcttcgac 
661 cccatcccca tccactactg cgcccccgcc ggctacgcca tcctgaagtg caacaacaag 
721 accttcaacg gcaccggccc ctgctacaac gtgagcaccg tgcagtgcac ccacggcatc 
781 aagcccgtgg tgagcaccca gctgctgctg aacggcagcc tggccgagga gggcatcatc 
841 atccgcagcg agaacctgac cgagaacacc aagaccatca tcgtgcacct gaacgagagc 
901 gtggagatca actgcacccg ccccaacaac aacacccgca agagcgtgcg catcggcccc 
961 ggccaggcct tctacgccac caacgacgtg atcggcaaca tccgccaggc ccactgcaac 
1021 atcagcaccg accgctggaa caagaccctg cagcaggtga tgaagaagct gggcgagcac 
1081 ttccccaaca agaccatcca gttcaagccc cacgccggcg gcgacctgga gatcaccatg 
1141 cacagcttca actgccgcgg cgagttcttc tactgcaaca ccagcaacct gttcaacagc 
1201 acctaccaca gcaacaacgg cacctacaag tacaacggca acagcagcag ccccatcacc 
" 1261 ctgcagtgca agatcaagca gatcgtgcgc atgtggcagg gcgtgggcca ggccacctac 
1321 gcccccccca tcgccggcaa catcacctgc cgcagcaaca tcaccggcat cctgctgacc 
1381 cgcgacggcg gcttcaacac caccaacaac accgagacct tccgccccgg cggcggcgac 
1441 atgcgcgaca actggcgcag cgagctgtac aagtacaagg tggtggagat caagcccctg 
1501 ggcatcgccc ccaccaaggc caagcgccgc gtggtgcagc gcgagaagcg cgccgtgggc 
1561 atcggcgccg tgttcctggg cttcctgggc gccgccggca gcaccatggg cgccgccagc 
1621 atcaccctga ccgtgcaggc ccgccagctg ctgagcggca tcgtgcagca gcagagcaac 
1681 ctgctgaagg ccatcgaggc ccagcagcac atgctgcagc tgaccgtgtg gggcatcaag 
1741 cagctgcagg cccgcgtgct ggccatcgag cgctacctga aggaccagca gctgctgggc 
1801 atctggggct gcagcggccg cctgatctgc accaccgccg tgccctggaa cagcagctgg 
1861 agcaacaaga gcgagaagga catctgggac aacatgacct ggatgcagtg ggaccgcgag 
1921 atcagcaact acaccggcct gatctacaac ctgctggagg acagccagaa ccagcaggag 
1981 aagaacgaga aggacctgct ggagctggac aagtggaaca acctgtggaa ctggttcgac 
2041 atcagcaact ggccctggta catcaagatc ttcatcatga tcgtgggcgg cctgatcggc 
2101 ctgcgcatca tcttcgccgt gctgagcatc gtgaaccgcg tgcgccaggg ctacagcccc 
2161 ctgagcttcc agaccctgac ccccagcccc cgcggcctgg accgcctggg cggcatcgag 
2221 gaggagggcg gcgagcagga ccgcgaccgc agcatccgcc tggtgagcgg cttcctgagc 
2281 ctggcctggg acgacctgcg caacctgtgc ctgttcagct accaccgcct gcgcgacttc 
2341 atcctgatcg ccgtgcgcgc cgtggagctg ctgggccaca gcagcctgcg cggcctgcag 
2401 cgcggctggg agatcctgaa gtacctgggc agcctggtgc agtactgggg cctggagctg 
2461 aagaagagcg ccatcagcct gctggacacc atcgccatca ccgtggccga gggcaccgac 
2521 cgcatcatcg agctggtgca gcgcatctgc cgcgccatcc tgaacatccc ccgccgcatc 
2581 cgccagggcttcgaggccgc cctgctgtaa ctcgag 
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1 gtcgacgcca ccatggatgc aatgaagaga gggctctgct gtgtgctgct gctgtgtgga 
61 gcagtcttcg tttcgcccag cgccagcacc gaggacctgt gggtgaccgt gtactacggc 
121 gtgcccgtgt ggcgcgacgc caagaccacc ctgttctgcg ccagcgacgc caaggcctac 
181 gagaccgagg tgcacaacgt gtgggccacc cacgcctgcg tgcccaccga ccccaacccc 
241 caggagatcg tgctgggcaa cgtgaccgag aacttcaaca tgtggaagaa cgacatggcc 
301 gaccagatgc acgaggacgt gatcagcctg tgggaccaga gcctgaagcc ctgcgtgaag 
361 ctgacccccc tgtgcgtgac cctgaactgc accgacacca acgtgaccgg.caaccgcacc 
421 gtgaccggca acagcaccaa caacaccaac ggcaccggca tctacaacat cgaggagatg ■ 
481 aagaactgca gcttcaacgc caccaccgag ctgcgcgaca agaagcacaa ggagtacgcc 
541 ctgttctacc gcctggacat cgtgcccctg aacgagaaca gcgacaactt cacctaccgc 
601 ctgatcaact gcaacaccag caccatcacc caggcctgcc ccaaggtgag cttcgacccc 
661 atccccatcc actactgcgc ccccgccggc tacgccatcc tgaagtgcaa caacaagacc 
721 ttcaacggca ccggcccctg ctacaacgtg agcaccgtgc agtgcaccca cggcatcaag 
781 cccgtggtga gcacccagct gctgctgaac ggcagcctgg ccgaggaggg catcatcatc 
841 cgcagcgaga acctgaccga gaacaccaag accatcatcg tgcacctgaa cgagagcgtg 
901 gagatcaact gcacccgccc caacaacaac acccgcaaga gcgtgcgcat cggccccggc 
961 caggccttct acgccaccaa cgacgtgatc ggcaacatcc gccaggccca ctgcaacatc 
1021 agcaccgacc gctggaacaa gaccctgcag caggtgatga agaagctggg cgagcacttc 
1081 cccaacaaga ccatccagtt caagccccac gccggcggcg acctggagat caccatgcac 
1141 agcttcaact gccgcggcga gttcttctac tgcaacacca gcaacctgtt caacagcacc 
1201 taccacagca acaacggcac ctacaagtac aacggcaaca gcagcagccc catcaccctg 
1261 cagtgcaaga tcaagcagat cgtgcgcatg tggcagggcg tgggccaggc cacctacgcc 
1321 ccccccatcg ccggcaacat cacctgccgc agcaacatca ccggcatcct gctgacccgc 
1381 gacggcggct tcaacaccac caacaacacc gagaccttcc gccccggcgg cggcgacatg 
1441 cgcgacaact ggcgcagcga gctgtacaag tacaaggtgg tggagatcaa gcccctgggc 
1501 atcgccccca ccaaggccaa gcgccgcgtg gtgcagcgcg agaagcgcgc cgtgggcatc 
1561 ggcgccgtgt tcctgggctt cctgggcgcc gccggcagca ccatgggcgc cgccagcatc 
1621 accctgaccg tgcaggcccg ccagctgctg agcggcatcg tgcagcagca gagcaacctg 
1681 ctgaaggcca tcgaggccca gcagcacatg ctgcagctga ccgtgtgggg catcaagcag 
1741 ctgcaggccc gcgtgctggc catcgagcgc tacctgaagg accagcagct gctgggcatc 
1801 tggggctgca gcggccgcct gatctgcacc accgccgtgc cctggaacag cagctggagc 
1861 aacaagagcg agaaggacat ctgggacaac atgacctggatgcagtggga ccgcgagatc 
1921 agcaactaca ccggcctgat ctacaacctg ctggaggaca gccagaacca gcaggagaag 
1981 aacgagaagg acctgctgga gctggacaag tggaacaacc tgtggaactg gttcgacatc 
2041 agcaactggc cctggtacat caagatcttc atcatgatcg tgggcggcct gatcggcctg 
2101 cgcatcatct tcgccgtgct gagcatcgtg aaccgcgtgc gccagggcta cagccccctg 
2161 agcttccaga ccctgacccc cagcccccgc ggcctggacc gcctgggcgg catcgaggag 
2221 gagggcggcg agcaggaccg cgaccgcagc atccgcctgg tgagcggctt cctgagcctg 
2281 gcctgggacg acctgcgcaa cctgtgcctg ttcagctacc accgcctgcg cgacttcatc 
2341 ctgatcgccg tgcgcgccgt ggagctgctg ggccacagca gcctgcgcgg cctgcagcgc 
2401 ggctgggaga tcctgaagta cctgggcagc ctggtgcagt actggggcct ggagctgaag 
2461 aagagcgcca tcagcctgct ggacaccatc gccatcaccg tggccgaggg caccgaccgc 
2521 atcatcgagc tggtgcagcg catctgccgc gccatcctga acatcccccg ccgcatccgc 
2581 cagggcttcg aggccgccct gctgtaactc gag 
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1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaactgcagcttcaacgccggcgccggccgcctgatcaactgcaacaccagcacc 
541 atcacccagg cctgccccaa ggtgagcttc gaccccatcc ccatccacta ctgcgccccc 
601 gccggctacg ccatcctgaa gtgcaacaac aagaccttca acggcaccgg cccctgctac 
661 aacgtgagca ccgtgcagtg cacccacggc atcaagcccg tggtgagcac ccagctgctg 
721 ctgaacggca gcctggccga ggagggcatc atcatccgca gcgagaacct gaccgagaac 
781 accaagacca tcatcgtgca cctgaacgag agcgtggaga tcaactgcac ccgccccaac 
841 aacaacaccc gcaagagcgt gcgcatcggc cccggccagg ccttctacgc caccaacgac 
901 gtgatcggca acatccgcca ggcccactgc aacatcagca ccgaccgctg gaacaagacc 
961 ctgcagcagg tgatgaagaa gctgggcgag cacttcccca acaagaccat ccagttcaag 
1021 ccccacgccg gcggcgacct ggagatcacc atgcacagct tcaactgccg cggcgagttc 
1081 ttctactgca acaccagcaa cctgttcaac agcacctacc acagcaacaa cggcacctac 
1141 aagtacaacg gcaacagcag cagccccatc accctgcagt gcaagatcaa gcagatcgtg 
1201 cgcatgtggc agggcgtggg ccaggccacc tacgcccccc ccatcgccgg caacatcacc 
1261 tgccgcagca acatcaccgg catcctgctg acccgcgacg gcggcttcaa caccaccaac 
1321 aacaccgaga ccttccgccc cggcggcggc gacatgcgcg acaactggcg cagcgagctg 
1381 tacaagtaca aggtggtgga gatcaagccc ctgggcatcg cccccaccaa ggccatcagc 
1441 agcgtggtgc agagcgagaa gagcgccgtg ggcatcggcg ccgtgttcct gggcttcctg 
1501 ggcgccgccg gcagcaccat gggcgccgcc agcatcaccc tgaccgtgca ggcccgccag 
1561 ctgctgagcg gcatcgtgca gcagcagagc aacctgctga aggccatcga ggcccagcag 
1621 cacatgctgc agctgaccgt gtggggcatc aagcagctgc aggcccgcgt gctggccatc 
1681 gagcgctacc tgaaggacca gcagctgctg ggcatctggg gctgcagcgg ccgcctgatc 
1741 tgcaccaccg ccgtgccctg gaacagcagc tggagcaaca agagcgagaa ggacatctgg 
1801 gacaacatga cctggatgca gtgggaccgc gagatcagca actacaccgg cctgatctac 
1861 aacctgctgg aggacagcca gaaccagcag gagaagaacg agaaggacct gctggagctg 
1 921 gacaagtgga acaacctgtg gaactggttc gacatcagca actggccctg gtacatcaag 
1981 atcttcatca tgatcgtggg cggcctgatc ggcctgcgca tcatcttcgc cgtgctgagc 
2041 atcgtgaacc gcgtgcgcca gggctacagc cccctgagct tccagaccct gacccccagc \ 
2101 ccccgcggcc tggaccgcct gggcggcatc gaggaggagg gcggcgagcaggaccgcgac 
2161 cgcagcatcc gcctggtgag cggcttcctg agcctggcct gggacgacct gcgcaacctg 
2221 tgcctgttca gctaccaccg cctgcgcgac ttcatcctga tcgccgtgcg cgccgtggag 
2281 ctgctgggcc acagcagcct gcgcggcctg cagcgcggct gggagatcct gaagtacctg 
2341 ggcagcctgg tgcagtactg gggcctggag ctgaagaaga gcgccatcag cctgctggac 
2401 accatcgcca tcaccgtggc cgagggcacc gaccgcatca tcgagctggt gcagcgcatc 
2461 tgccgcgcca tcctgaacat cccccgccgc atccgccagg gcttcgaggc cgccctgctg 
2521 taactcgag 
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1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccggcgcc ggccgcctga tcaactgcaa caccagcacc 
541 atcacccagg cctgccccaa ggtgagcttc gaccccatcc ccatccacta ctgcgccccc 
601 gccggctacg ccatcctgaa gtgcaacaac aagaccttca acggcaccgg cccctgctac 
661 aacgtgagcaccgtgcagtg cacccacggc atcaagcccg tggtgagcac ccagctgctg 
721 ctgaacggca gcctggccga ggagggcatc atcatccgca gcgagaacct gaccgagaac. 
781 accaagacca tcatcgtgca cctgaacgag agcgtggaga tcaactgcac ccgccccaac 
841 aacaacaccc gcaagagcgt gcgcatcggc cccggccagg ccttctacgc caccaacgac 
901 gtgatcggca acatccgcca ggcccactgc aacatcagca ccgaccgctg gaacaagacc 
961 ctgcagcagg tgatgaagaa gctgggcgag cacttcccca acaagaccat ccagttcaag 
1021 ccccacgccg gcggcgacct ggagatcacc atgcacagct tcaactgccg cggcgagttc 
1081 ttctactgca acaccagcaa cctgttcaac agcacctacc acagcaacaa cggcacctac 
1141 aagtacaacg gcaacagcag cagccccatc accctgcagt gcaagatcaa gcagatcgtg 
1201 cgcatgtggc agggcgtggg ccaggccacc tacgcccccc ccatcgccgg caacatcacc 
1261 tgccgcagca acatcaccgg catcctgctg acccgcgacg gcggcttcaa caccaccaac 
1321,aacaccgaga ccttccgccc cggcggcggc gacatgcgcg acaactggcg cagcgagctg 
1381 tacaagtaca aggtggtgga gatcaagccc ctgggcatcg cccccaccaa ggccaagcgc 
1441 cgcgtggtgc agcgcgagaa gcgcgccgtg ggcatcggcg ccgtgttcct gggcttcctg 
1501 ggcgccgccg gcagcaccat gggcgccgcc agcatcaccc tgaccgtgca ggcccgccag 
1561 ctgctgagcg gcatcgtgca gcagcagagc aacctgctga aggccatcga ggcccagcag 
1621 cacatgctgc agctgaccgt gtggggcatc aagcagctgc aggcccgcgt gctggccatc 
1681 gagcgctacc tgaaggacca gcagctgctg ggcatctggg gctgcagcgg ccgcctgatc 
1741 tgcaccaccg ccgtgccctg gaacagcagc tggagcaaca agagcgagaa ggacatctgg 
1 801 gacaacatga cctggatgca gtgggaccgc gagatcagca actacaccgg cctgatctac 
1861 aacctgctgg aggacagcca gaaccagcag gagaagaacg agaaggacct gctggagctg 
1921 gacaagtgga acaacctgtg gaactggttc gacatcagca actggccctg gtacatcaag 
1981 atcttcatca tgatcgtggg cggcctgatc ggcctgcgca tcatcttcgc cgtgctgagc 
2041 atcgtgaacc gcgtgcgcca gggctacagc cccctgagct tccagaccct gacccccagc 
2101 ccccgcggcc tggaccgcct gggcggcatc gaggaggagg gcggcgagca ggaccgcgac 
2161 cgcagcatcc gcctggtgag cggcttcctg agcctggcct gggacgacct gcgcaacctg 
2221- tgcctgttca gctaccaccg cctgcgcgac ttcatcctga tcgccgtgcg cgccgtggag 
2281 ctgctgggcc acagcagcct gcgcggcctg cagcgcggct gggagatcct gaagtacctg 
2341 ggcagcctgg tgcagtactg gggcctggag ctgaagaaga gcgccatcag cctgctggac 
2401 accatcgcca tcaccgtggc cgagggcacc gaccgcatca tcgagctggt gcagcgcatc 
2461 tgccgcgcca tcctgaacat cccccgccgc atccgccagg gcttcgaggc cgccctgctg 
2521 taactcgag 
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1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gggcgccggc aactgcaaca ccagcaccat cacccaggcc 
421 tgccccaagg tgagcttcga ccccatcccc atccactact gcgcccccgc cggctacgcc 
481 atcctgaagt gcaacaacaa gaccttcaac ggcaccggcc cctgctacaa cgtgagcacc 
541 gtgcagtgca cccacggcat caagcccgtg gtgagcaccc agctgctgct gaacggcagc 
601 ctggccgagg agggcatcat catccgcagc gagaacctga ccgagaacac caagaccatc 
661 atcgtgcacc tgaacgagag cgtggagatc aactgcaccc gccccaacaa caacacccgc 
721 aagagcgtgc gcatcggccc cggccaggcc ttctacgcca ccaacgacgt gatcggcaac 
781 atccgccagg cccactgcaa catcagcacc gaccgctgga acaagaccct gcagcaggtg 
841 atgaagaagc tgggcgagca cttccccaac aagaccatcc agttcaagcc ccacgccggc 
901 ggcgacctgg agatcaccat gcacagcttc aactgccgcg gcgagttctt ctactgcaac 
961 accagcaacc tgttcaacag cacctaccac agcaacaacg gcacctacaa gtacaacggc 
1021 aacagcagca gccccatcac cctgcagtgc aagatcaagc agatcgtgcg catgtggcag 
1081 ggcgtgggcc aggccaccta cgcccccccc atcgccggca acatcacctg ccgcagcaac 
1141 atcaccggca tcctgctgac ccgcgacggc ggcttcaaca ccaccaacaa caccgagacc 
1201 ttccgccccg gcggcggcga catgcgcgac aactggcgca gcgagctgta caagtacaag 
1261 gtggtggaga tcaagcccct gggcatcgcc cccaccaagg ccaagcgccg cgtggtgcag 
1321 cgcgagaagc gcgccgtggg catcggcgcc gtgttcctgg gcttcctggg cgccgccggc 
1381 agcaccatgg gcgccgccag catcaccctg accgtgcagg cccgccagct gctgagcggc 
1441 atcgtgcagc agcagagcaa cctgctgaag gccatcgagg cccagcagca catgctgcag 
1501 ctgaccgtgtggggcatcaagcagctgcaggcccgcgtgctggccatcgagcgctacctg 
1561 aaggaccagc agotgctggg catctggggc tgcagcggcc gcctgatctg caccaccgcc 
1621 gtgccctgga acagcagctg gagcaacaag agcgagaagg acatctggga caacatgacc 
1681 tggatgcagt gggaccgcga gatcagcaac tacaccggcc tgatctacaa cctgctggag 
1741 gacagccaga accagcagga gaagaacgag aaggacctgc tggagctgga caagtggaac 
1801 aacctgtgga actggttcga catcagcaac tggccctggt acatcaagat cttcatcatg 
1861 atcgtgggcggcctgatcgg cctgcgcatc atcttcgccg tgctgagcat cgtgaaccgc 
1921 gtgcgccagg gctacagccc cctgagcttc cagaccctga cccccagccc ccgcggcctg 
1981 gaccgcctgg gcggcatcga ggaggagggc ggcgagcagg accgcgaccg cagcatccgc- 
2041 ctggtgagcg gcttcctgag cctggcctgg gacgacctgc gcaacctgtg cctgttcagc 
2101 taccaccgcc tgcgcgactt catcctgatc gccgtgcgcg ccgtggagct gctgggccac 
2161 agcagcctgc gcggcctgca gcgcggctgg gagatcctga agtacctggg cagcctggtg 
2221 cagtactggg gcctggagct gaagaagagc gccatcagcc tgctggacac catcgccatc 
2281 accgtggccg agggcaccga ccgcatcatc gagctggtgc agcgcatctg ccgcgccatc 
2341 ctgaacatcc cccgccgcat ccgccagggc ttcgaggccg ccctgctgta actcgag 
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gpl40mod.TVl.mut7.delV2 (SEQIDNO:121) 



1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccggcgcc ggccgcctga tcaactgcaa caccagcacc 
541 atcacccagg cctgccccaa ggtgagcttc gaccccatcc ccatccacta ctgcgccccc 
601 gccggctacg ccatcctgaa gtgcaacaac aagaccttca acggcaccgg cccctgctac 
661 aacgtgagca ccgtgcagtg cacccacggc atcaagcccg tggtgagcac ccagctgctg 
721 ctgaacggca gcctggccga ggagggcatc atcatccgca gcgagaacct gaccgagaac 
781 accaagacca tcatcgtgca cctgaacgag agcgtggaga tcaactgcac ccgccccaac 
841 aacaacaccc gcaagagcgt gcgcatcggc cccggccagg ccttctacgc caccaacgac 
901 gtgatcggca acatccgcca ggcccactgc aacatcagca ccgaccgctg gaacaagacc 
961 ctgcagcagg tgatgaagaa gctgggcgag cacttcccca acaagaccat ccagttcaag 
1021 ccccacgccg gcggcgacct ggagatcacc atgcacagct tcaactgccg cggcgagttc 
1081 ttctactgca acaccagcaa cctgttcaac agcacctacc acagcaacaa cggcacctac 
1141 aagtacaacg gcaacagcag cagccccatc accctgcagt gcaagatcaa gcagatcgtg 
1201 cgcatgtggc agggcgtggg ccaggccacc tacgcccccc ccatcgccgg caacatcacc 
1261 tgccgcagca acatcaccgg catcctgctg acccgcgacg gcggcttcaa caccaccaac 
1321 aacaccgaga ccttccgccc cggcggcggc gacatgcgcg acaactggcg cagcgagctg 
1381 tacaagtaca aggtggtgga gatcaagccc ctgggcatcg cccccaccaa ggccatcagc 
1441 agcgtggtgc agagcgagaa gagcgccgtg ggcatcggcg ccgtgttcct gggcttcctg 
1501 ggcgccgccg gcagcaccat gggcgccgcc agcatcaccc tgaccgtgca ggcccgccag 
1561 ctgctgagcg gcatcgtgca gcagcagagc aacctgctga aggccatcga ggcccagcag 
1621 cacatgctgc agctgaccgt gtggggcatc aagcagctgc aggcccgcgt gctggccatc 
1681 gagcgctacc tgaaggacca gcagctgctg ggcatctggg gctgcagcgg ccgcctgatc 
1741 tgcaccaccg ccgtgccctg gaacagcagc tggagcaaca agagcgagaa ggacatctgg 
1801 gacaacatga cctggatgca gtgggaccgc gagatcagca actacaccgg cctgatctac 
1861 aacctgctgg aggacagcca gaaccagcag gagaagaacg agaaggacct gctggagctg 
1921 gacaagtgga acaacctgtg gaactggttc gacatcagca actggccctg gtacatctaa 
1981 ctcgag 
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l gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 

3 61 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 

4 81 atgaagaact gcagcttcaa cgccggcgcc ggccgcctga tcaactgcaa caccagcacc* 
541 atcacccagg cctgccccaa ggtgagcttc gaccccatcc ccatccacta ctgcgccccc 
601 gccggctacg ccatcctgaa gtgcaacaac aagaccttca acggcaccgg cccctgctac 
661 aacgtgagca ccgtgcagtg cacccacggc atcaagcccg tggtgagcac ccagctgctg 
721 ctgaacggca gcctggccga ggagggcatc atcatccgca gcgagaacct gaccgagaac 
781 accaagacca tcatcgtgca cctgaacgag agcgtggaga tcaactgcac ccgccccaac 
841 aacaacaccc gcaagagcgt gcgcatcggc cccggccagg ccttctacgc caccaacgac 
901 gtgatcggca acatccgcca ggcccactgc aacatcagca ccgaccgctg gaacaagacc 
961 ctgcagcagg tgatgaagaa gctgggcgag cacttcccca acaagaccat ccagttcaag 

1021 ccccacgccg gcggcgacct ggagatcacc atgcacagct tcaactgccg cggcgagttc 
1081 ttctactgca acaccagcaa cctgttcaac agcacctacc acagcaacaa cggcacctac 
1141 aagtacaacg gcaacagcag cagccccatc accctgcagt gcaagatcaa gcagatcgtg 
12 01 cgcatgtggc agggcgtggg ccaggccacc tacgcccccc ccatcgccgg caacatcacc 
12 61 tgccgcagca acatcaccgg catcctgctg acccgcgacg gcggcttcaa caccaccaac 
1321 aacaccgaga ccttccgccc cggcggcggc gacatgcgcg acaactggcg cagcgagctg 
1381 tacaagtaca aggtggtgga gatcaagccc ctgggcatcg cccccaccaa ggccaagcgc 
1441 cgcgtggtgc agcgcgagaa gcgcgccgtg ggcatcggcg ccgtgttcct gggcttcctg 
1501 ggcgccgccg gcagcaccat gggcgccgcc agcatcaccc tgaccgtgca ggcccgccag 
1561 ctgctgagcg gcatcgtgca gcagcagagc aacctgctga aggccatcga ggcccagcag 
1621 cacatgctgc agctgaccgt gtggggcatc aagcagctgc aggcccgcgt gctggccatc 
1681 gagcgctacc tgaaggacca gcagctgctg ggcatctggg gctgcagcgg ccgcctgatc 
1741 tgcaccaccg ccgtgccctg gaacagcagc tggagcaaca agagcgagaa ggacatctgg 
1801 gacaacatga cctggatgca gtgggaccgc gagatcagca actacaccgg cctgatctac 
1861 aacctgctgg aggacagcca gaaccagcag gagaagaacg agaaggacct gctggagctg 
1921 gacaagtgga acaacctgtg gaactggttc gacatcagca actggccctg gtacatctaa 
1981 ctcgag 
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gpl20mod.TVl.delV2 (SEQIDNO:119) 

1 gaattcatgc gcgtgatggg cacccagaag aactgccagc agtggtggat ctggggcatc 
61 ctgggcttct ggatgctgat gatctgcaac accgaggacc tgtgggtgac cgtgtactac 
121 ggcgtgcccg tgtggcgcga cgccaagacc accctgttct gcgccagcga cgccaaggcc 
181 tacgagaccg aggtgcacaa cgtgtgggcc acccacgcct gcgtgcccac cgaccccaac 
241 ccccaggaga tcgtgctggg caacgtgacc gagaacttca acatgtggaa gaacgacatg 
301 gccgaccaga tgcacgagga cgtgatcagc ctgtgggacc agagcctgaa gccctgcgtg 
361 aagctgaccc ccctgtgcgt gaccctgaac tgcaccgaca ccaacgtgac cggcaaccgc 
421 accgtgaccg gcaacagcac caacaacacc aacggcaccg gcatctacaa catcgaggag 
481 atgaagaact gcagcttcaa cgccggcgcc ggccgcctga tcaactgcaa caccagcacc 
541 atcacccagg cctgccccaa ggtgagcttc gaccccatcc ccatccacta ctgcgccccc 
601 gccggctacg ccatcctgaa gtgcaacaac aagaccttca acggcaccgg cccctgctac 
661 aacgtgagca ccgtgcagtg cacccacggc atcaagcccg tggtgagcac ccagctgctg 
721 ctgaacggca gcctggccga ggagggcatc atcatccgca gcgagaacct gaccgagaac 
781 accaagacca tcatcgtgca cctgaacgag agcgtggaga tcaactgcac ccgccccaac 
841 aacaacaccc gcaagagcgt gcgcatcggc cccggccagg ccttctacgc caccaacgac 
901 gtgatcggca acatccgcca ggcccactgc aacatcagca ccgaccgctg gaacaagacc 
961 ctgcagcagg tgatgaagaa gctgggcgag cacttcccca acaagaccat ccagttcaag 
1021 ccccacgccg gcggcgacct ggagatcacc atgcacagct tcaactgccg cggcgagttc 
108 1 ttctactgca acaccagcaa cctgttcaac agcacctacc acagcaacaa cggcacctac 
1141 aagtacaacg gcaacagcag cagccccatc accctgcagt gcaagatcaa gcagatcgtg 
1201 cgcatgtggc agggcgtggg ccaggccacc tacgcccccc ccatcgccgg caacatcacc 
1261 tgccgcagca acatcaccgg catcctgctg acccgcgacg gcggcttcaa caccaccaac 
1321 aacaccgaga ccttccgccc cggcggcggc gacatgcgcg acaactggcg cagcgagctg 
1381 tacaagtaca aggtggtgga gatcaagccc ctgggcatcg cccccaccaa ggccaagcgc 
1441 cgcgtggtgc agcgcgagaa gcgcftaactc gag 
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Vpu_TV2_C_ZAwt (SEQE)NO:118) 



ATGTTAGATTTAACTGCAAGAATAGATTCTAGATTAGGAATAGGAGCATTGA 

TAGTAGCACTAATCATAGCAATAATAGTGTGGACCATAGTATATATAGAATA 

TAGGAAATTGGTAAGGCAAAGGAAAATAGACTGGTTAGTTAAAAGGATTAG 

GGAAAGAGCAGAAGACAGTGGCAATGAGAGCGAGGGGGATACTGAAGAATT 

ATCGACACTGGTGGATATGGGGCATCTTAGGCTTTTGGATGCTAATGATGTGT 

AA 
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Vpu_TV2_C_ZAopt (SEQ ID NO:l 17) 

ATGCTGGACCTGACCGCCCGCATCGACAGCCGCCTGGGCATCGGCGCCCTGA 

TCGTGGCCCTGATCATCGCCATGATCGTGTGGACCATCGTGTACATCGAGTAC 

CGCAAGCTGGTGCGCCAGCGCAAGATCGACTGGCTGGTGAAGCGCATCCGCG 

AGCGCGCCGAGGACAGCGGCAACGAGAGCGAGGGCGACACCGAGGAGCTGA 

GCACCCTGGTGGACATGGGCCACCTGCGCCTGCTGGACGCCAACGACGTGTA 
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Vpr_TV2_C_ZAwt (SEQIDNO:116) 



V P r _ 1Vi - - V""-< ' 

ATGGAACAAGCCCCAGAAGACCAGGGGCCGCAGAGGGAACCATACAATGAA 
T^CACTAGAGCTTTTAGAAGAACTCAAGCAGGAAGCTGTCAGACACmC 
CTAGACCATGGCTCCATAACTTAGGACAACATATCTATGAAACCTATGGAGA 
TAOTGGACAGGAGTTGAAGCAATAATAAGAATCCTGCAACAATTACTGTTT 
ATTCATTTCAGGATTGGGTGCCATCATAGCAGAATAGGCATTTTGCGACAGA 

GAAGAGCAAGAAATGGAGCCAATAGATCC 
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Vpr_TV2_C_ZAopt (SEQIDNO:115) 

ATGGAGCAGGCCCCCGAGGACCAGGGCCCCCAGCGCGAGCCCTACAACGAG 

TGGACCCTGGAGCTGCTGGAGGAGCTGAAGCAGGAGGCCGTGCGCCACTTCC 

CCCGCCCCTGGCTGCACAACCTGGGCCAGCACATCTACGAGACCTACGGCGA 

CACCTGGACCGGCGTGGAGGCCATCATCCGCATCCTGCAGCAGCTGCTGTTC 

ATCCACTTCCGCATCGGCTGCCACCACAGCCGCATCGGCATCCTGCGCCAGC 

GCCGCGCCCGCAACGGCGCCAACCGCAGC 
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Vif_TV2_C_ZAwt (SEQ ID NO: 1 14) 



ATGGAAAACAGATGGCAGGTGCTGATTGTGTGGCAGGTAGACAGGATGAAG 

ATTAGAACATGGCACAGTTTAGTAAAGCACCATATGTATGTTTCGAGGAGAG 

CTGATGGATGGTTCTACAGACATCATTATGAAAGCAGACACCCAAAAGTAAG 

TTCAGAAGTACACATCCCATTAGGAGATGCCAGGTTAGTAATAAAAACATAT 

TGGGGTCTGCAGACAGGAGAAAGAGCTTGGCATTTGGGTCACGGAGTCTCCA 

TAGAATGGAGATTGAGAAGATATAGCACACAAGTAGACCCTGACCTGACAG 

ACCAACTAATTCATATGCATTATTTTGATTGTTTTGCAGAATCTGCCATAAGG- 

AAAGCCATACTAGGACAGATAGTTAGCCCTAAGTGTGACTATCAAGCAGGAC 

ATAACAAGGTAGGATCTCTACAATACTTGGCACTGACAGCATTGATAAAACC 

AAAAAAGATAAAGCCACCTCTGCCTAGTGTTAGGAAATTAGTAGAGGATAGA 

TGGAACAAGCCCCAGAAGACCAGGGGCCGCAGAGGGAACCATACAATGAAT 

GGACACTAG 



FIGURE 85 



WO 02/04493 



91/114 



PCTAJS01/21241 



Vif_TV2_C_ZAopt (SEQ ID NO:l 13) 



ATGGAGAACCGCTGGCAGGTGCTGATGGTGTGGCAGGTGGACCGCATGAAGA 

TCCGCACCTGGCACAGCCTGGTGAAGCACCACATGTACGTGAGCCGCCGCGC 

CGACGGCTGGTTCTACCGCCACCACTACGAGAGCCGCCACCCCAAGGTGAGC 

AGCGAGGTGCACATCCCCCTGGGCGACGCCCGCCTGGTGATCAAGACCTACT 

GGGGCCTGCAGACCGGCGAGCGCGCCTGGCACCTGGGCCACGGCGTGAGCA 

TCGAGTGGCGCCTGCGCCGCTACAGCACCCAGGTGGACCCCGACCTGACCGA 

CCAGCTGATCCACATGCACTACTTCGACTGCTTCGCCGAGAGCGCCATCCGG * 

AAGGCCATCCTGGGCCAGATCGTGAGCCCCAAGTGCGACTACCAGGCCGGCC 

ACAACAAGGTGGGCAGCCTGCAGTACCTGGCCCTGACCGCCCTGATCAAGCC 

CAAGAAGATCAAGCCCCCCCTGCCCAGCGTGCGCAAGCTGGTGGAGGACCGC 

TGGAACAAGCCCCAGAAGACCCGCGGCCGCCGCGGCAACCACACCATGAAC 

GGCCACTAG 
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TatExon2_TV2_C_ZAwt (SEQ ID NO:l 12) 

CCCTTATCCCAAACCCGAGGGGACCCGACAGGCTCGGAGGAATCGAAGAAG 
AAGGTGGAGAGCAAGACAGCAGCAGATCCATTCGATTAG 
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TatExon2_TV2_C_ZAopt (SEQ ID NO:l 1 1) 

CCCCTGAGCCAGACCCGCGGCGACCCCACCGGCAGCGAGGAGAGCAAGAAG 
AAGGTGGAGAGCAAGACCGCCGCCGACCCCTTCGACTAG 
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TatExonl_TV2_C_ZAwt (SEQ ID NO: 1 10) 

ATGGAGCCAATAGATCCTAACCTAGAACCCTGGAACCATCCAGGAAGTCAGC 
CTAAAACrGCTTGTAATGGGTGTTACTGTAAACGTTGCAGCTATCATTGTCTA 
GTTTGCTTTCAGAAAAAAGGCTTAGGCATTTACTATGGCAGGAAGAAGCGGA 
GACAGCGACGAAGCGCTCCTCCAAGCAATAAAGATCATCAAGATCCTCTACC 

AAAGCAG 
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TatExonl_TV2_C_ZAopt (SEQIDNO:109) 

ATGGAGCCCATCGACCCCAACCTGGAGCCCTGGAACCACCCCGGCAGCCAGC 

CCAAGACCGCCTGCAACGGCTGCTACTGCAAGCGCTGCAGCTACCACTGCCT 

GGTGTGCTTCCAGAAGAAGGGCCTGGGCATCTACTACGGCCGCAAGAAGCGC 

CGCCAGCGCCGCAGCGCCCCCCCCAGCAACAAGGACCACCAGGACCCCCTGC 

CCAAGCAG 
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RevExon2_TV2_C_ZAwt (SEQ ID NO: 108) 

ACCCTTATCCCAAACCCGAGGGGACCCGACAGGCTCGGAGGAATCGAAGAA 

GAAGGTGGAGAGCAAGACAGCAGCAGATCCATTCGATTAGTGAGCGGATTCT 

TGACACTTGCCTGGGACGACCTACGAAGCCTGTGCCTCTTCTGCTACCACCGA 

TTGAGAGACTTCATATTAATTGTAGTGAGAGCAGTGGAACTTCTGGGACACA 

GTAGTCTCAGGGGACTGCAGAGGGGGTGGGGAACCCTTAA 
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RevExon2_TV2_C_ZAopt (SEQIDNO:107) 



CCCTACCCCAAGCCCGAGGGCACCCGCCAGGCCCGCCGCAACCGCCGCCGCC 

GCTGGCGCGCCCGCCAGCAGCAGATCCACAGCATCAGCGAGCGCATCCTGGA 

CACCTGCCTGGGCCGCCCCACCAAGCCCGTGCCCCTGCTGCTGCCCCCCATCG 

AGCGCCTGCACATCAACTGCAGCGAGAGCAGCGGCACCAGCGGCACCCAGT 

AGAGCCAGGGCACCGCCGAGGGCGTGGGCAACCCCTAA 
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